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CHAPTER I 


INTRODUCTION 


The Iowa Placement Examinations, constructed by Dr. George D. 
Stoddard and others in 1925, have demonstrated their value in many 
colleges and universities. This is evidenced by the fact that certain 
schools have employed them repeatedly in the guidance and placement 
of students and also by an increasing number of studies of the exam- 
inations which are critical and constructive in nature. It was only 
to be expected that the first efforts in the construction of these tests 
should be confronted by certain problems for which objective data 
were lacking and which could be obtained only by a study of the 


results of the examinations secured over a period of years. 


It is the purpose of this study to undertake a critical analysis of 
certain of these examinations and to reconstruct or modify them in 
the light of principles derived from the analysis. In some cases the re- 
construction involves radical changes both in content and in testing 


techniques. In other cases the modifications are slight. 


However, the investigation has aimed at a larger goal than that of 
improvement of existing materials. The experimentation has been 
concerned with developing new materials, new techniques, and new 
principles. Where old materials have been retained it will be shown 
that these are functioning at such a high level that it seemed desir- 
able to retain them and to focus attention where it would be possible 


to bring about considerable improvement. 


CHAPTER II 
А SUMMARY OF RELATED INVESTIGATIONS 


The underlying principles of the placement examination are outlined 
by Dean C. E, Seashore, of the Graduate College of the University 
of Iowa, and are given in the following Passage(56) : 


“(1) It will be devoted to a single subject or field of knowledge 


(6) the record of a general 


lement this examination, but 
that is not essential, as a series of placement tests will be more sig- 


; (7) it will be prepared by 
ccessful teacher and writer 
n for a specific purpose and 


the organization of sections 
of the class on the basis of this objective information about the 


character of the preparation and the natural aptitude for the subject." 

The construction of the Iowa Placement Examinations is described 
by Stoddard (61) in a Study which outlines the methods employed 
in constructing the tests, such as selection of materials, validation of 
materials, reliabilities, etc. In the selection of content for the tenta- 
tive edition of the examinations, objective criteria were not available. 
No one had Previously attempted to answer the question, as to which 
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materials commonly taught in a specific high school course were most 
important for success in a college course in that subject. In this situa- 
tion the judgment of experienced college teachers had to be relied 
upon. Wherever scientific studies provided a partial basis for the 
selection of content these were utilized. 

The examinations which were developed are in two series, the apti- 
tude tests and the training tests. Each aptitude test is designed to 
measure the capacity of a student for a particular subject, even when 
the student has had no work in this particular field. It is essentially 
a specialized intelligence test. The training tests are to be used to 
determine how well the student has mastered the fundamental con- 
tent of the high school course in the field covered by the examination. 

The reliability coefficients of the examinations reported in the above 
investigation range from .87 to .94. The correlations of Placement 
Examination scores with first semester grades in a specific subject 
range from .26 to .95. Average correlations were: for the aptitude 
series, .50; for the training series, .60; and for both series combined, 
65. Some coefficients were based on scores from small classes. 

In comparing the results of the Placement Examinations with other 
measures commonly employed, the above investigation reports that 
partial and multiple correlations demonstrate the superiority of Place- 
ment Examinations over high school achievement and the traditional 
intelligence test as a device for predicting college success in a subject. 
At Case School of Applied Science pooled Placement Examination 
scores proved much superior to Army Alpha and pooled Council of 
Education Tests in prognosticating general academic success. 

In two articles Stoddard (59) (60) reports: (1) A brief summary 
of results obtained with the tentative edition and (2) a report showing 
for each test and for each college the number of students taking the 
test, and the mean, median, upper quartile, and lower quartile ob- 
tained. 

In reporting the results of the early use of the tests. in English, 
mathematics, and chemistry at the University of Minnesota, Langlie 
(32) concludes that the revised edition was an improvement upon the 
tentative edition, as indicated by the coefficients of variability for the 
various tests. The coefficients of correlation of test scores with final 
grades in the courses corresponding to the tests ran higher for the 
revised (1925) tests than for the tentative (1924) editions. The 
study also indicates that the training tests should be given after a 
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brief introductory review. This procedure would tend to remove 
spurious individual differences in ability arising from differences in 
lapse of time between high school and college. 


In an earlier study, Langlie (31) attempted to determine whether 
the two concepts of “aptitude test” and “training test” are justified, 


He reports among other conclusions: 


“(1) Each test is dependent to a fair degree upon intelligence, 
particularly the aptitude tests; 

(2) A particular aptitude is dependent upon training in that par- 
ticular line; 

(3) Aptitude (is) general intelligence plus training.” 

He concludes further that “there seems to be no real differentia- 
tion between aptitude and training...” 

In commenting on Langlie’s results, given above, Stoddard (63) 


states: “The first statement simply points out that intelligence as 
measured by tests has something in common with aptitude for per- 
formance in a school subject, and with performance in that subject. 
This is to be expected, especially in view of the fact that the apti- 
tude examinations are in reality a special type of intelligence test . . . 
Langlie’s second statement is essentially fallacious, for it arises out 
of an accident of Sampling. All students who took the aptitude tests 
had really studied the subject matter for which the tests were designed 
to measure aptitude. For example, the same students took both the 
Chemistry Aptitude and Chemistry Training Examinations, Since 
they had studied chemistry in high school there was a correlation 
between chemistry aptitude and chemistry performance. But the 
Chemistry Aptitude Examination is designed principally for those with 
по training in chemistry; when given to such students it surely be- 
comes meaningless to say that aptitude depends upon training . . , 
Finally, Langlie’s third statement which cannot be true with respect 


“To the question, Has a real distinction been made between apti- 
tude and training (for a particular subject), it may be answered 
that the correlations between the two series are much lower than the 
reliabilities of the examinations, and that the aptitude tests can all 
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be given to students thoroughly unfamiliar with the school subject." 

A later study by Langlie (33) may be summarized as follows: 

1. The placement tests reveal individual differences to an un- 
usually high degree. The 1925 edition is greatly improved over the 
1924 edition in this respect. 

2. As a check on the classification of students in the mathematics 
and chemistry departments, the training tests are more effective than 
are the aptitude tests. 

3. Sectioning classes on the basis of ability could be done fairly 
successfully with scores on the aptitude and training tests as guides. 

4. The use of information gained from test scores would be much 
more effective and accurate if scholastic grades were made more 
reliable and objective. 

In an article discussing the uses of the examinations, Stoddard (62) 
points out that prognosis is not an end itself. In this connection he 
states: “The Iowa Placement Examinations are less a prognosis test 
than an educative procedure. Their aim is not primarily to predict 
academic success, but to render its attainment more likely; that is, 
to give aid in the setting up of educational conditions such that sound 
principles of selection, class-sectioning, and curriculum organization 
may be more effectively applied to the securing of maximum perform- 
ance on the part of each student.” 

The most detailed and comprehensive analysis of the predictive 
power and of the specificity of the examinations has been made by 
Hansen (21). Data were utilized which were secured from twenty- 
eight colleges and universities with a large number of cases for each 
test studied. The results for each test were correlated with those for 
other tests and with grades in all subjects where available. In com- 
paring the correlation coefficients secured, Hansen was especially con- 
cerned with the specificity of the examinations. One question which 
runs throughout the study is, do the tests predict success in the specific 
subject for which they are designed better than in other subjects. 

His results may be summarized in part as follows: 

1. The Towa Placement Examinations are superior to intelligence 
tests for predictive purposes. Correlations between Placement Tests 
and grades in specific subjects (exclusive of the Physics Aptitude Test) 
range from .438 to .545 with an average of .485 in contrast with the 
correlations between the Thorndike Intelligence Test with grades in 
specific subjects ranging from .162 to .528 with an average of .30. 
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2. The training tests are shown to be somewhat superior to the 
aptitude tests for the purpose of prognosis. According to this study, 
the aptitude tests (exclusive of Physics Aptitude)correlate about .48, 
and the training tests about .51 with grades in their own subjects. 
"These average correlations are Somewhat lower than those which have 
been reported by Stoddard; namely, .50 for the aptitude tests and 
-60 for the training tests. 

3. Aptitude in one subject may or may not imply aptitude in 
another subject, depending on the extent to which the compared sub- 
jects possess common skills or mental functions, 

In the same study Hansen gives the average correlation coefficient 
for each test with first semester gtades in the subject represented 
by the test. These are as follows: 


Number Number of 


Name of Examination Average ғ of Cases Colleges 
English Aptitude 445 3064 24 
Mathematics Aptitude 438 2671 32 
Chemistry Aptitude 514 2891 29 
Physics Aptitude 397 789 7 
Foreign Language Aptitude 484 642 7 
English Training 493 3450 28 
Mathematics Training 479 2735 25 
Chemistry Training 545 1655 18 
Physics Training 534 819 4 
French Training 519 300 4 


In a chapter dealing with the Prediction of success in college after 
a survey of the studies in this field, Symonds (68) comments upon the 
results of the Iowa Placement Examinations as follows: “The results 
of this work at Iowa are remarkable.” He then Proceeds to outline 
the results which have already been given above. 


Thurstone (70) in a similar manner states, “The Iowa Placement 
Examinations are perhaps the best form of objective content examina- 
tion available” (referring to the training tests). In regard to the 
aptitude tests he continues, “The Iowa Aptitude Tests do not depend 
so much on training as regular content examinations, but most of them 
are more influenced by training than the typical psychological exam- 
inations. . . , Very likely the best results in the long run will be 
obtained by a combination of tests measuring training and aptitude.” 

A minor study by Bear (2), based on but 38 cases, reports that 
the Physics Aptitude test is “not a measure of physics ability, but 
one of general intelligence.” His conclusion is based on a much 
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higher correlation with the average grade (.64 := .06) in all subjects 
than with physics grades (.25 = .10). The results are hardly valid 
because of the small number of cases employed. Hansen, in his study 
described above, did find that the Physics Aptitude test is consid- 
erably less effective than the other tests of the series. However, the 
average correlation coefficient based on results from seven colleges and 
universities over a two year period is much higher than the one 
reported by Bear. 

In two articles Cornog and Stoddard (11) (12) describe the place- 
ment examinations in chemistry and indicate some of their uses. The 
results show that students in beginning chemistry may be effectively 
sectioned by the use of the placement examinations. A large proportion 
of the students whose scores are in the upper quartile on the exam- 
inations secure А or B grades with few or no failures. For those 
whose test scores are in the lower quartile on the examinations the 
situation as to grades received is reversed. 


In a more recent article Cornog and Stoddard (13) report further 
results with students in freshman chemistry courses. The study pre- 
sents data to support the following conclusions: (1) the best scores 
made by college freshmen are virtually the same as the best scores 
made by high-school students; while the poorest scores made by high- 
school students are but little less than the poorest scores made by 
freshmen. (2) The range and general distributions of scores are 
similar in both groups. (3) There is a real difference of performance 
in favor of the freshmen but its magnitude is not impressive. (4) 
The performance of the high-school group is especially noteworthy when 
it is recalled that the summer months had intervened between. 

A number of studies have been made in which one or more of 
the Iowa Placement Examinations were employed either as tools in 
sectioning classes, as a means of detecting students in need of special 
attention because of deficiencies, or to measure the efficiency of 
various methods of instruction. Some of these involve case studies 
and as a result cannot be summarized adequately in the space avail- 
able here. Further they are less directly related to the purpose of 
this dissertation than those of a critical nature. For these reasons 
these investigations will be treated briefly. 

An investigation by Lemon (34) describes an experiment as to the 


effectiveness of individual student analyses and remedial instruction. 
An experimental and a control group were matched on the basis of 
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Placement Examinations, intelligence scores, and various environmental 
conditions. The experimental group was studied further as to read- 
ing abilities, methods of study, interest in work, etc. 

The experimental group showed improvement in average scholarship 
and in the number remaining in college at the end of the second 
semester. In the case of the control group seventeen out of fifty-three 
were eligible for registration at the beginning of the sophomore year, 
while in the experimental group thirty-two out of fifty-three were eli- 
gible. The study sets up a plan of student guidance and analysis which 
would aim to reduce the high percentage of elimination among begin- 
ning college students. 

Remmers (49) outlines another remedial study of potentially and 
actually failing students. This investgation deals with the lower 
twenty-five per cent of the freshmen at Purdue University as measured 
by various tests. The problem attacked is that of the possibility of 
improving scholastic achievement by means of directed study. The 
methods used are as follows: (1) a diagnostic investigation in terms 
of tests and interviews; (2) a division of the students into two com- 
parable groups with special treatment of one of these by means of 
directed study especially in English and mathematics, the other group 
serving as a control group. 

The remedial aspects of the study produced results which indicate 
that the procedure employed held seventy per cent of the experimental 
group as against fifty-two per cent of the control group in the Uni- 
versity throughout the year. The mean differences are in all cases 
slight, but usually in favor of the experimental group. 

Young and Van der Beke (76) employed the French Training test 
in an experiment in first year French. This experiment was conducted 
on the assumption that students will make more rapid progress in 
reading ability in the long run if at the beginning of the course they 
are tested for their knowledge of certain important items, and if steps 
are taken to remedy the defects of the individual student. The Place- 
ment Examination, French Training, was employed and a series of 
remedial drills given to overcome weaknesses in student preparation. 
At the end of the semester the experimental group had outdistanced 
six other groups which were taught without preliminary testing. 

In an investigation at the University of Illinois, Tharp (69) found 
the Iowa Placement Examination effective in placing students at their 
proper level of ability. He reports, “The coefficient of correlation of 
the semester grades with these scores was .66 + .02. There were 37 
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per cent of the cases correctly placed and 47 per cent missed by one 
place, or, 84 per cent within an error of only one place." (By one place 
is meant one letter grade.) The above quotation refers to the French 
Training test. 

The results with the Foreign Language Aptitude test were somewhat 
lower. In this case 75 per cent were correctly placed within an error 
of one letter grade. 

Tharp’s conclusions definitely approve the sectioning of students 
on the basis of standardized achievement tests. Of the tests employed 
none proved superior to the Iowa Placement Examinations. A gen- 
eral intelligence test proved far less effective than the achievement or 
aptitude tests. This test gave but 42 per cent correctly placed within 
one letter grade, in comparison with 75 per cent for the Foreign 
Language Aptitude test and 84 per cent for the French Training test. 

Foster (15) in studying the improvement of freshman students in 
the mechanics of written composition employed along with other meas- 
ures the English Training Examination, Revised B. The tests were 
given at the beginning and again at the end of the semester. Gains 
were computed not only for the test as a whole but for separate parts 
and for categories within parts, e.g., in the punctuation section gains 
were noted in specific punctuation situations. But few of Foster’s 
results are of interest in connection with this dissertation. Of greatest 
significance are the following: 

1. “The detailed item analysis of performance on Part 4 (sentence 
structure) of the English Training Test indicated that this part is 
neither valid nor reliable for purposes of measurement.” 

2. “If the writer’s count of punctuation ‘errors’ in the written 
themes of freshmen is used as a criterion, Part 2 (punctuation) of the 
English Training Test appears to be fairly well balanced with respect 
to the different categories of punctuation situations which it contains. 
It is suggested, however, that the number of items involving the pos- 
sessive case of nouns be reduced somewhat, that the number of items 
involving independent elements and series of elements be increased, 
and that the number of items involving so-called ‘sentence sense’ be 
reduced considerably.” 

That Part 4 is neither valid nor reliable is in agreement with re- 
sults secured by methods to be reported in a later chapter. That the 
same results should be found by two investigators from varied ap- 
proaches adds to the significance of the conclusion. 
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In a recent monograph Hammond and Stoddard (20) thoroughly 
review the studies of the utility of the Iowa Placement Examinations 
in a number of engineering colleges. The purposes of the examinations 
are outlined, followed by a discussion of their validity and reliability. 
Many of the tables are taken from studies which have been previously 
presented in this chapter. 


The various studies summarized above would seem to warrant the 
following conclusions: 


1. The Iowa Placement Examinations have demonstrated their value 
as instruments to be employed in student personnel administration. 


2. The average correlations with grades reported by Stoddard as 
= 60 for the training series are somewhat higher than those based on 
a large number of institutions over a longer period of time. It would 
seem that a safer estimate would be that the average correlation for 
the aptitude series is + .45 and for the training series + .50. As the 
Physics Aptitude test averages somewhat lower in predictive power it 
has not been considered in the above estimates. 


3. The distinction between aptitude and training is justified, first 
because a student with no training in a given subject may take the 
test for aptitude in that subject, and second because each aptitude 
test in general predicts success in its own field better than in any other. 
This also indicates the specificity of the aptitude tests. In addition 
the correlations between the aptitude and training series are much lower 


than their reliability coefficients which is evidence that they do not 
measure the same thing. 


CHAPTER Ш 


GENERAL STATEMENT OF METHOD AND PROCEDURE 


A general statement of method and procedure which will apply 
equally well to all the examinations studied is not possible because 
of variable factors from test to test. However, the analytical steps 
were essentially the same for all the examinations, the greatest differ- 
ences entering into the construction of new materials and experimenta- 
tion with new techniques. 

The following preliminary questions were raised at the outset of 
the investigation: 

1. How well do the tests predict college achievement in specific 
subjects? This involves a survey of all results secured to date for 
each test. 

2. What are the relative values of each sub-test in prediction? 

3. What are the characteristics of those items which are most 
discriminating in predictive power? 

4. Can fundamental principles based on this item analysis be es- 
tablished which may be generally applied in the construction of 
placement tests? 

5. How can the new materials be best combined with those em- 
ployed in the present tests? Where the reconstruction has been so 
radical that an entirely new examination has emerged the problem 
will be to show that the new is superior to the old as a whole and in 
parts. 

6. Is it possible in the case of the training series to construct tests 
which predict semester grades well and at the same time are valid 
measures of high school achievement in the subject concerned? This 
will be referred to as “double validation." In another way the ques- 
tion could be asked,—Are the requirements of a placement test which 
aims to predict superior or inferior achievement in a college course 
opposed to the principles which must be met by a well-balanced high 
school achievement test in the subject under consideration? If so, 
then one type of test may not be satisfactorily substituted for the 
other type. To illustrate the above, the Spanish Training Test may 


17 


18 IOWA STUDIES IN EDUCATION 


be considered. The content of this test is not valid throughout, 
especially in Part I. However, it predicts success in Spanish very 
well (for tests of this type). To make the content of this test more 
valid as an achievement test and not reduce its predictive power will 


add greatly to the utility of the test and will meet the adverse criti- 
cisms which have been offered. 


7. Can the training tests be made sufficiently flexible to cover a 
grade range from the eleventh grade in high school to the beginning 


of the second year college, in those cases where subjects are offered 
consecutively, as in French or Spanish? 


8. Is it possible to meet the requirements of the broad range re- 
ferred to above and at the same time maintain high predictive power 
as college placement examination? 


9. Would it be advisable to increase the objectivity of certain parts, 


e.g., by the substitution of multiple-choice responses for those of the 
recall type? 


10. As a measure of aptitude how well do tests function which 
demand the integration of unfamiliar principles at higher and higher 
levels? For example, a principle may be explained and its application 
called for, then to this another principle may be added and problems 
presented demanding the use of both principles, then a third principle, 
etc. At each step all the previous principles may be involved or selec- 
tion may be demanded. In this investigation only a preliminary study 
of this has been possible. 


11. In the aptitude series what part does reading ability contribute 
to the predictive power of the tests? 


The above questions involve (1) a search for new principles which 
may be utilized in the construction of placement tests by future in- 
vestigators, (2) a consideration of the relative effectiveness of certain 
techniques of testing, such as recall, multiple choice, etc., (3) a study 
of validity from the points of view of a placement test and of an 
achievement test, (4) a consideration of possible ranges of utility, 
(5) a study of reliabilities and (6) the internal construction of the 


tests so that each item is effectively contributing to the discrimination 
of superior and inferior levels of student ability. 
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ANALYTICAL PROCEDURES 
The steps in the analysis of the various examinations will be shown 
in specific detail for each test in the section devoted to it. The gen- 
eral outline is indicated below. 
1. One of the first steps consisted in getting together for each 
test the data which would be most useful in evaluating its performance. 


a. All of the available coefficients of correlation for Form A of each 
test! with first semester grades in the specific field corresponding to 
the test, for various colleges, were secured in order to determine the 
average correlation with grades. These were taken from various studies 
and from data on file which have not been published. These aver- 
age, highest, and lowest coefficients are presented in Table 1. 


TABLE 1 
AVERAGE, HIGHEST, AND LOWEST CORRELATION COEFFICIENTS FOR IOWA PLACEMENT 
EXAMINATIONS, REVISED A, WITH FIRST SEMESTER GRADES, YEARS 
1925-1927 (INCLUSIVE) 


Examination No. of Schools Average Highest Lowest 
CA-1 30 473 682 .230 
CT-1 23 534 689 .250 
ЕА-1 27 415 .690 .298 
ET-1 29 486 702 436 
MA-1 34 451 654 281 
MT-1 29 486 675 181 
ЕА-1 А .548 732 .269 
FT-1 6 478 649 .280 
ST-1 3 527 573 484 
PA-1 7 .398 624 .280 


b. The reliability coefficients (Form A) reported by Hammond and 
Stoddard (20) are shown in Table 2. 


1Unless otherwise indicated all statements apply to Form A. In all tables 
which follow the various tests will be identified in the following manner: 

СА-1 = Chemistry Aptitude, Series 1, Form А 

CT-1 — Chemistry Training, Series 1, Form A 

EA-1 — English Aptitude, Series 1, Form A 

ET-1 = English Training, Series 1, Form A 

ЕА-1 = Foreign Language Aptitude, Series 1, Form A 

ЕТ-1 = French Training, Series 1, Form A 

ST-1 = Spanish Training, Series 1, Form A 

MA-1 = Mathematics Aptitude, Series 1, Form A 

MT-1 = Mathematics Training, Series 1, Form A 
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TABLE 2 


RELIABILITY OF THE IOWA PLACEMENT EXAMINATIONS 


Standard 
Reliability Number Deviation Probable Error РЕ 
Score 
Test Coefficient of Cases (SD) of Score SD 
CA-1 0.88 100 17.5 4.0 0.23 
CT-1 0.93 100 28.0 5.1 0.18 
EA-1 0.82 100 9.2 2.6 0.29 
ET-1 0.90 100 34.3 73 0.21 
FA-1 0.97 100 26.7 3.1 0.11 
FT-1 0.93 100 28.1 50 0.18 
ST-1 0.82 100 16.7 4.8 0.29 
MA-1 0.86 100 7.0 17 0.24 
MT-1 0.88 100 10.4 24 0.23 
PT-1 0.85 100 24.4 6.4 0.26 
РА-1 0.89 100 19.0 4.2 0.22 


2. The reliability coefficients for each part of each test (Form A) 
were determined to compare with the reliabilities of new parts which 
may be considered as possible substitutes for sections of the present 


examinations. These are given for each test in the chapter reporting 
the investigation of that particular examination. 


3. In securing a sample for each test to be analyzed papers were 
obtained from a number of colleges and universities. This was done 
to give a representative sampling. In every case papers were secured 
from four or more colleges. The analysis was based on Form A be- 
cause it had been more widely used and more comparative data were 
available. To insure further representativeness of the sampling, the 
mean (average) and standard deviation for each test were computed 
and compared with the norms based on several thousands of cases. 
The Pearson product-moment correlation coefficient for test scores 
with first semester grades in the subject concerned was determined to 
make certain that the correlation was not unusually low or unusually 
high. Table 3 shows the results for each test compared with the norms 


given in the manual? which accompanies the Iowa Placement Exam- 
inations. 


2Мау be secured from the Extension Division, University of Iowa, Iowa City, 
Towa, 


eee ee 
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TABLE 3 
MEANS, STANDARD DEVIATION, AND PEARSON CORRELATION COEFFICIENTS WITH FIRST 
SEMESTER GRADES IN THE SUBJECT FOR SAMPLES TO BE ANALYZED COMPARED WITH 
NORMS OR AVERAGES 


r with Average r 
Number Meanoí Mean Sigma Sigma Grades with 
"Test (Sample) Sample (Norm)(Sample) (Norm)(Sample) P.E.” Grades 
СА-1 193 59.63 53 20.55 21 460 .038 473 
CT-1 175 75.70 70 3041 34 441 040 534 
EA-1 239 41.16 40 8.91 10 507 032 ALS 
ET-1 225 101.20 96 31.24 38 410 .035 486 
FA-1 209 73.12 78 28.86 27 642 027 .548 
FT-1 200 66.45 60 28.29 25 .532 034 478 
ST-1 183 54.83 42 25.02 16 535 035 527 
MA-1 223 27.27 26 10.41 11 405 038 451 
MT-1 199 41.23 34 11.96 13 .360 042 486 


In general the means and standard deviations do not deviate widely 
from the norms with the exception of Spanish Training. In this case 
both the mean and standard deviation are considerably higher than the 
norms but it must also be considered that the mean of the semester 
grades was above average. For this reason the sample was utilized 
in the detailed item analysis to be reported later. The Spanish group 
was a superior group both as to test scores and as to semester grades. 
The correlation coefficient for Spanish Training scores with first semes- 
ter Spanish grades as indicated in the table above is very nearly the 
same as the average coefficient reported thus far. The remaining corre- 
lation coefficients are nearer to the average coefficient than they are to 
either the lowest or highest coefficients reported. (see Table 1). 

In deciding upon 200 as the approximate number of cases to be 
employed in the sample to be studied, certain practical as well as 
theoretical considerations played a part. Wood (75) reports that 
ratings of items as to difficulty based on 200 cases have a reliability 
coefficient of .92 and with 400 cases of .98. While estimates based on 
200 cases leave much to be desired for precise prediction, yet a reli- 
ability coefficient of .92 is sufficiently high for the needs of this study. 
The practical considerations limiting the number of cases were those 
of availability of materials and time required for analysis. 

To check the above reliability reported by Wood the percentages 
of students getting each item correct on larger samples were correlated 
with the results based on the samples employed in this investigation, 
The results are given in Table 4. 


S.C.E R.T., West Bengaj 
Date /1..„А...5Э. 
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TABLE 4 


PEARSON CORRELATION COEFFICIENTS BETWEEN PERCENTAGES OF CORRECT RESPONSES 
TO ITEMS OF A TEST FOR A LARGER AND FOR A SMALLER GROUP 


No. of 
Test N Large N Small Correlation Coefficient Items 
MA-1 1442 223 921 + 013 65 
MT-1 1534 199 -918 + .011 95 
EA-1 828 239 .966 + .007 65 
СЄТ-1 529 175 .937 + .009 70 


4. The next step in the evaluation of the examinations as pre- 
dictive instruments consisted in computing, for the samples referred 
to above, the (Pearson) correlation coefficients for each part of each 
test with grades and of a test with every other part of the same test. 
In addition the partial and multiple correlation coefficients and the 
regression equation in deviation form with scores expressed as z scores? 
were found for each test. The following criteria will be employed in 
the interpretation of the above. Other things being equal, (a) those 
parts of an examination function well which correlate well with grades; 
(b) each part of an examination should correlate as low as possible with 
every other part; (c) the parts of an examination which fail to meet 
conditions (a) and (b), and which have the lowest regression co- 
efficients when the scores are reduced to a comparable basis, are con- 
tributing least to prediction and should be replaced if possible by 


8The scores were expressed as z scores in the regression equation in order to 
make the regression coefficients more directly comparable. To convert the re- 
gression coefficient in deviation form to the z scores form it is necessary to multiply 
the coefficient by the quotient obtained from dividing the standard deviation 
of the variable in question by the standard deviation of the variable which is 
being predicted. For example, given an equation in deviation form as follows: 
xı = bie.3 хә + big,2 Хз; to change to z score form. 


Let =, 3? = zy and X9 23. 
сі co 73 
а: [4] 
Then, zı = бәз —°2— z2 + big,» — 73 z3 
ci сі 


In the set-up for computing a partial standard deviation the formula is ina 
3-variable problem: 


91,23 = 01 A: 1-5? 
12 13.2 
If 72,3 is to be found іп 2 score form then тү must be in terms of z scores and 
becomes one, thus dropping out of the equation. Either method outlined may 
be employed or one may be used as a check on the other, 
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more valid parts; (d) in addition to meeting criteria (a), (b), and (c) 

each part must be fairly reliable if it is to be retained. The zero 
order, partial and multiple coefficients, and regression equations are 
given for each examination in later chapters. 

5. An item analysis for each test employing the sample described 
in (3) above was carried out in the manner outlined below. The 
papers of those students who received grades of A and B in the subject 
were grouped together; those of C grade, and those of D and F grades 
constituted the other groups. The items were then tabulated as right, 
wrong, or omitted and the percentages of rights for each item for the 
three groups, A-B, C, and D-F, were computed. Thus for any item 
the table would show the percentage of A-B students answering cor- 
rectly, the percentage of C students answering correctly, and the per- 
centage of D-F students answering the item correctly. Because of the 
space required, these results cannot be given here. The results of 
this item analysis were then studied in detail to determine, (a) which 
items discriminated highly between A-B students and D-F students, 
(b) what were the characteristics of those items which discriminate 
highly between the A-B and the D-F groups and those which do not 
discriminate well, (c) what principles could be formulated to apply 
to the construction of new tests. 

The question as to what amonut of difference between the percent- 
age right for the A-B group and the percentage right for the D-F 
group is significant, may be determined by finding the probable error 
of each proportion by employing the formula taken from Holzinger 


(25, p. 248), РЕ = .6745 4| ра. 
N 


and from the values so found determining the probable error of the 
difference between the proportion of rights for the A-B group (р!) 
and the D-F group (p?) by employing the formula taken from Hol- 


zinger (25, p. 249). P. E. (s. g) = AG. E.,)?4- (P. Es)? 


In this study a difference which is three times its P.E. will be con- 
sidered significant because the percentages right for the A-B group 
almost without exception exceed the percentages right for the D-F 
group, with the C group standing between these two extremes. In 
such a situation it is extremely improbable that these differences could 
be chance differences. The probability that chance would give a dif- 
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ference which is three times its probable error is roughly 1:22. That 
this would occur 45 times out of 50 becomes so improbable that опе 
can be certain that sampling errors cannot account for the differences 
found. When it is also considered that the differences between the 
average total scores of the A-B and D-F groups are statistically sig- 
nificant, the probability that chance factors account for the differences 
found is negligible. Further, if the differences were due largely to 
chance then the sum of the differences (some being minus and some 
plus) should approach zero. On the contrary, practically no minus 
differences are found and these can in most cases be shown to be due 
to a faulty selection or wording of the item. 


6. To show the discriminatory power of each part of a test as 
a whole, the average z scores for the A-B group and for the D-F 
group on each part were found. The differences between the average 
Scores of these two groups was then found to indicate the extent to 
which the two groups had been separated. These differences have 
also been expressed in terms of the percentage of overlapping of the 
A-B and D-F groups. Other things being equal, a test which has a 
small percentage of overlapping of A-B and D-F groups is desired. 

7. 'The final step in the analysis consisted of the segregation of 
those items and parts which were highly discriminative and those but 
slightly so. These items or parts were then carefully examined to de- 
termine their characteristics and to formulate principles relative to the 
factors which make an item weak or strong in discriminative power. 
In many cases clear-cut characteristics were not found. In a few 
cases, however, this attack led to the discovery of significant trends. 


In summary, the analytical steps were: 


1. All the pertinent results reported by various investigators as to 
the predictive value and reliability of the tests were brought together. 


2. The reliability coefficients for each part of each test were 
computed. 


3. A sample of 200 or more cases was secured for each examination 
and checked as to its representativeness. 


4. The regression coefficient for each part of each test was found 
in z score form to facilitate the comparison of the relative contribution 
of each part in predicting the criterion. In addition the solution of 
the 5-variable problem in computing the regression coefficient and the 
partial and multiple coefficients of correlation gave the correlation co- 
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efficients for each part of each examination with the criterion and with 
each other. 


5. An item analysis was made giving the percentage of A-B, C, and 
D-F students getting each item correct, with a determination of the dis- 
criminatory power of each item in terms of the difference between the 
percentage right for the A-B group and the D-F group. 


6. The average z scores of the A-B group and of the D-F group 
were found for each part of each test and the z score difference deter- 
mined. This was further expressed in terms of the percentage of 
overlapping of the scores of the A-B group and the D-F group. 


7. The items and parts of high or low discriminatory power were 
segregated for a study of their characteristics in an attempt to derive 
principles applicable to the construction of more valid items and tests. 


One problem which was regarded as incidental to the major problem 
but related to it closely is in regard to what part is contributed by the 
reading tests to the predictive power of the aptitude tests. This 
problem was approached through a study of the correlation of the 
various reading sections with the criterion (grades) and of the relative 
contribution of the reading sections by means of the regression 
technique. 


In planning the experimental attack upon the comparative study of 
techniques of testing, many practical difficulties prevented the inclusion 
of all possible techniques. For the most part, where comparisons are 
made, they are between the recall (completion) and the multiple-choice 
types of response. 


In constructing new parts and new examinations the effort was 
made to apply the principles derived from the analyses. In arrang- 
ing for groups for experimental testing which would permit thorough, 
detailed inter-comparisons practical difficulties were encountered, the 
chief one being the amount of time which would be required for any 
group of students. It is hardly possible to get a group for several 
periods of testing when regular class time must be used. 


In studying the new materials the analytical steps were essentially 
the same as the above, so as to permit comparisons of the relative 
effectiveness of the old and the new examinations, 


CHAPTER IV 


CHEMISTRY APTITUDE AND CHEMISTRY TRAINING 
ANALYTICAL AND EXPERIMENTAL RESULTS 


CHEMISTRY APTITUDE 
ANALYTICAL PROCEDURES AND RESULTS 


The Chemistry Aptitude test is made up of four parts as follows: 

Part 1 (20 items, 15 minutes). Simple arithmetical relations which 
appear in chemistry. 

Part 2 (30 items, 12 minutes). Three paragraphs covering college 
textbook material in chemistry, followed by true-false statements which 
measure the student’s ability to read exactly and to resist generaliza- 
tions. 

Part 3 (15 items, 12 minutes). A measure of chemistry reading 
comprehension which, for correct answers, requires a grasp of the 
ideas and relations involved. 

Part 4 (60 items, 5 minutes). A measure of interest in chemistry 
by simple factual questions of the true-false type which indicate the 
accuracy of the student’s general knowledge of chemistry. Students 
with a particular fitness and liking for chemistry, it is assumed, ac- 
quire this knowledge incidentally. 

The results of the application of the multiple correlation and re- 
gression technique to the sample employed in this study are shown 
in Table 5. 

The multiple coefficient of .478 is not much higher than the zero 
order coefficient for the whole test of .460 which indicates that the 
parts are almost optimally weighted as they are. This, together with 
the regression coefficients, would seem to indicate that the least ef- 
fective parts, 2 and 4, which show fairly high inter-correlation with 
Parts 1 and 3, should be eliminated and more effective materials 
substituted for them. The function of Part 2 is already cared for 
by the other parts and Part 4 adds very little comparatively to the 
predictive power of the test. 

The reliability coefficient for the entire test is .88. The coefficients 
for each part are shown in Table 6. 
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TABLE 5 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN CHEMISTRY OF PARTS OF THE 
CHEMISTRY APTITUDE EXAMINATION (REV.) A. (MULTIPLE CORRELATION AND 
REGRESSION WEIGHTS) 
CHEMISTRY APTITUDE REVISED A (x = 193) 


Variables 
Variable 1 --First semester Chemistry grades 
Variable 2 --Score Part 1 
Variable 3 --Score Part 2 
Variable 4. Score Part 3 
Variable --Score Part 4 


Zero Order Coefficients 


Variable 2 3 4 5 А.М. Sigma 
1 423 326 411 .363 4.010 1.143 
2 695 538 611 18.606 8.916 
3 609 .502 13.610 5.858 
4 707 21.204 7170 
5 6.938 3.257 
Multiple Coefficient 
R,2345 = 478 
Regression Coefficients and. Equations 
bi2.546 046 
Deviation form raw scores 

һзм — —.037 
х1 = .046xo — .037x3 + .044x4 + .032x5 

044 

Deviation form z scores 

bis.es4 032 

zı = 35922 — .189z3 + .276z, + .09125 
TABLE 6 
RELIABILITY COEFFICIENTS OF PARTS 
Part 1 Part 2 Part 3 Part 4 
90 84 89 58 


So far as reliability is concerned, Parts 1, 2, and 3 are satisfactory 
but Part 4 is very unreliable. 

The differences between the average z scores for the A-B and the D-F 
groups show that Parts 1 and 3 are most effective in discriminating 
between these two groups, with Part 2 low and Part 4 almost as high 
as Part 3. These results are given in Table 7. 
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The inter-part correlations (Table 9) for Part 1B were omitted 
because this part was found to be so easy that over half of the scores 
were perfect and all were high. This part will therefore receive no 
further consideration. The above inter-part correlations indicate that 
while all tests involve reading comprehension, they by no means 
measure the same functions. 


The results shown in Table 10 indicate that of the new parts, 2A, 
2B, and 5B are very effective for the prediction of first semester grades. 
Parts 2A and 2B are intended to be equivalent forms measuring the 
same type of mental functions. This is a type of test which places 
before the student certain principles and then requires their application 
to representative situations. Part 5B similarly requires the application 
of principles to the solution of problems. While Parts 3A, 3B, and 
4B are as effective as certain parts of the placement examination they 


do not give promise of adding significantly to prediction, although 
Part 3A could be employed if needed. 


All parts except 4B (Table 11) are very reliable for a single test 
section. In view of the high reliability of 2A, 2B, and 5B along with 
their effectiveness for prediction these parts give promise of adding 
significantly to the usefulness of the placement test. 


Table 12 gives the correlations between the more promising new parts 
and parts of the placement examination revised B. All these correla- 
tions are fairly low except between 2A (new) and Part 4 (placement). 


TABLE 10 


MEANS, STANDARD DEVIATIONS, AND CORRELATIONS WITH FIRST SEMESTER GRADES, 
EXPERIMENTAL FORMS 


Standard 
r РЕ. Меап Deviation 
Form A-1 (N = 237) 
Part 2A 588 028 24010 8.778 
Part 3A 434 .035 19.944 6.156 
Form B-1 (N = 231) 
Part 2B 662 025 21.772 9.128 
Part 3B 384 .038 8.255 2454 
Form B-2 (N = 242) 
Part 4B 391 1037 6.459 4.781 


Part 5В 585 .028 6.983 3.014 
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ТАВГЕ 11 


RELIABILITY COEFFICIENTS OF (EXPERIMENTAL) PARTS 


Form A-1 (N — 243) Form B-1 (N — 246) Form B-2 (N — 252) 

Part 2A .963 Part 2B 972 Part 4B 454 

Part 3A -989 Part 3B 855 Part 5B .907 
TABLE 12 


CORRELATION BETWEEN PARTS OF IOWA PLACEMENT CHEMISTRY APTITUDE REVISED B 
AND CERTAIN PARTS OF EXPERIMENTAL FORMS 


Experimental Form A-1 Chemistry Aptitude Rev. B (N = 86) 
Part 1 Part 2 Part 3 Part 4 

Part 2A 422 375 .550 771 

Part 3A 420 296 422 571 
Experimental Form B-1 Chemistry Aptitude Rev. B (N — 76) 
Part1 Part 2 Part 3 Part 4 

Part 2B 496 237 .332 255 

Part 3B 357 124 194 192 
Experimental Form В-2 Chemistry Aptitude Rev, B (N — 71) 
Part 1 Part 2 Part 3 Part 4 

Part 5B 497 .248 .383 .384 


As Part 4 is one of the parts to be eliminated this fairly high inter- 
correlation is not important. The important inter-correlations are 2A, 
2B, and 5B with Parts 1 and 3 of the placement test as these represent 
the most reliable and effective materials to be combined into a new 
examination. The highest of these is .550 (2A vs. 3), the remaining 
are all below .50. The above results show that the most effective new 
parts do not measure the same functions as the most effective parts of 
the placement test. 


А comparison of the relative effectiveness for prediction (Table 13), 
of Parts 2 and 4 of the placement examination B and the parts of the 
new tests reveals that 2A and 3A, 2B and SB are decidedly superior 
to Parts 2 and 4 of the placement examination. These parts (2 and 4) 
were shown by the analytical results to be ineffective. 


А study of combinations of parts shows that the sum of Parts 1 
and 3 (placement) and 2A (new) correlates .665 with first semester 
grades in comparison with .565 for the total scores on the placement 
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TABLE 13 


CORRELATION COEFFICIENTS WITH FIRST SEMESTER GRADES FOR PARTS OF THE PLACE- 
MENT CHEMISTRY APTITUDE REV. B AND EXPERIMENTAL FORMS 
BASED ON SCORES FROM THE SAME GROUPS 


Chemistry Aptitude (N — 86) Experimental Form А-1 


Rev. B 
r P.E. r P.E. 
Part 2 342 065 Part 2A -683 039 
Part 4 067 .073 Part 3A 588 049 
Total Score -565 050 
Sum of 1 (I. P. E) + 3 (I. P. E) + 2A 665 041 
Sum of 1 (I. P. E) + 3 (IL. P. E) + 3A 633 043 
Chemistry Aptitude (N — 76) Experimental Form B-1 
Rev. B 
\ r РЕ. r P.E. 
Part 2 261 072 Part 2B 526 056 
Part 4 314 .070 Part 3B 316 070 
Total Score 476 .060 
Sum of 1 (I. P. E) + 3 (I. P. E) + 2B .540 055 
Chemistry Aptitude (N — 71) Experimental Form B-2 
Rev. B 
r P.E. r P.E. 
Part 2 063 .080 Part 4B .328 072 
Part 4 341 O71 Part 5B 686 043 
Total Score 438 065 
Sum of 1 (I. P. E) + 3 (L P. E) + 5B 641 047 


test. With another group a combination of Parts 1 and 3 (placement) 
and 3B (new) correlates .540 with grades as compared with .476 for 
the total placement scores. A similar combination employing 5B (new) 
correlates .641 with grades in comparison with .438 for the total 
placement score. - Thus it would seem that by eliminating Parts 2 and 
4 and substituting for them any of these new parts, a more effective 
instrument is produced. 

The results of the item analysis, Table 14, show that Parts 2A, 2B, 
and 5B rank very high in the number of effective items. Most of the 
differences are well above four times their probable errors for Parts 
2A and 2B, and for Part 5B all are more than four times their probable 
errors. Part 3A also ranks well in this respect, as does Part 3B. Part 
4B shows less than half of the items effective. The item analysis 
reveals the same trend as the results given previously—that Parts 2A, 


2B, and 5B are very effective and that Parts 3A and 3B are moderately 
so, with Part 4B of little value. 


IOWA PLACEMENT EXAMINATIONS 33 


TABLE 14 
NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT 
FOR THE A-B AND D-F GROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 
No. 3 or more 


Total Items Times P. E.'s 

Form А-1 

Part 2А 33 32 

Part 3A 24 18 
Form B-1 

Part 2B 33 28 

Part 3B 10 1 
Form B-2 

Part 4B. 20 2? 9 

Part 5B 10 10 


The conclusions which the above results seem to warrant are: (1) 
that Parts 1 and 3 of the placement test are effective and reliable and 
that Parts 2 and 4 are not effective for the prediction of grades. Part 
4 is also too unreliable to be retained; (2) that of the new parts, 2A, 
2B, and 5B are reliable and eífective for prediction being superior to 
Parts 2 and 4 of the placement test; (3) that a combination of Parts 
1 and 3 of the placement test and either 2A, 2B, or 5B gives a more 
effective predictive instrument than the present placement test. In- 
asmuch as 2A and 2B are essentially equivalent forms and somewhat 
more reliable than 5B for which no equivalent form was prepared, it 
would seem that the best combination should include these parts. 


CHEMISTRY TRAINING 
ANALYTICAL PROCEDURES AND RESULTS 
The four parts of the Chemistry Training Examination are as fol- 


lows: 
Part 1 (45 items, 8 minutes). Knowledge of fundamentals of 


chemical processes. Tested by true-false statements. 

Part 2 (45 items, 12 minutes). Covers valences, formulas, names 
of compounds, and the completion and balancing of equations. 

Part 3 (50 items, 8 minutes). Manufacturing processes and the 
applications of chemistry. True-false statements. 

Part 4 (12 items, 15 minutes). Fundamental chemical problems in 
which the mechanics of arithmetic are reduced to a minimum. 

The application of the partial and multiple correlation and regres- 
sion technique gave the results shown in Table 15. 
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TABLE 15 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN CHEMISTRY OF THE VARIOUS 
PARTS OF THE CHEMISTRY TRAINING EXAMINATION (REVISED) A. (MULTIPLE CORRELA- 
TION AND REGRESSION WEIGHTS) CHEMISTRY TRAINING REVISED A (N = 174) 


Variables 


First semester Chemistry grades 
-Score Part 1 
-Score Part 2 
-Score Part 3 
LeemeAnstrt utc Score Part 4 


Zero Order Coefficients 


Variable 2 3 4 5 AM. Sigma 
1 .368 317 367 354 3.891 1.186 
2 378 626 494 22434 10.065 
3 554 403 21.828 , 9.597 
4 525 16.674 8.739 
5 16.087 9.087 


Multiple Coefficient 


Ri345 = 448 
Regression Coefficients апа Equations 


12.345 0198 Deviation form raw scores 
13.245 0156 xı = .020x2 + .016хз + .013x4 + .022х5 
14.235 9133 Deviation form z scores 
15.234 0215 ті = .169z» + .129z3 + .09624 + .16925 


The multiple coefficient .448 is not significantly above the zero order 
coefficient for the whole test, .441, indicating that the parts are op- 
timally weighted as they are. The sample employed in this investiga- 
tion, however, shows lower correlation with grades than is usually ob- 
tained with this examination. The mean of the twenty-three coefficients 
reported in Chapter III is .534, and if the three low coefficients, each 
based on a very small number of cases, are eliminated the range of the 
coefficients is from .432 to .689. The regression coefficients indicate 
that Parts 1 and 4 are most effective for prediction with Part 2 some- 
what below them and Part 3 less effective than Part 2. All parts, 
however, are contributing significantly to prediction. 

The differences between the average z scores of the A-B and the 
D-F groups, shown in Table 16 indicate that each part discriminates 


well between superior and inferior students and that the whole test is 
very effective in this respect. 
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TABLE 16 
DIFFERENCES BETWEEN AVERAGE Z SCORES OF A-B AND D-F GROUPS 
Whole Test Part1 Part2 Part3 Part4 


Difference 1.306 1.04 1.01 96 91 
Percentage D-F reaching or exceeding 
mean A-B 8.69 14.92 15.62 16.85 18.14 


The reliability coefficient for the whole test, .93, is as high as is 
usually found for objective tests. The part reliabilities are shown in 
Table 17. 

TABLE 17 
RELIABILITY COEFFICIENTS OF PARTS OF CHEMISTRY TRAINING 
REV. A (N = 100) 
Part 1 Part 2 Part 3 Part 4 
83 87 85 лз 


The part reliabilities are relatively high for such short units. Part 4 
should be improved in this respect. 

The item analysis gave the results shown in Table 18. These indi- 
cate that Part 2 is most effective in this respect, with Part 4 next and 
with Parts 1 and 3 least effective. This is probably to be expected 
for true-false items and it is doubtful if this is a satisfactory criterion 
for such items. Differences which are less than three times their 
probable errors may conceivably be significant when all are in the 
same direction. 

TABLE 18 
NUMBER OF ITEMS IN EACH PART SHOWING DIFFERENCES BETWEEN AVERAGE PERCENT- 


AGE.CORRECT FOR A-B AND D-F STUDENTS WHICH ARE THREE OR MORE AND FOUR OR 
MORE TIMES THEIR PROBABLE ERRORS 


No.3 or more No.4 or more 

Total Items Times P. E.'s Times P. E.'s 
Part 1 45 16 12 
Part 2 45 29 24 
Part 3 50 1 6 
Part 4 12 6 6 


A consideration of the content validity of the examination based on 
textbook analyses and expert judgment indicates that the examination 
is satisfactory in this respect, covering as it does valence, formulas, 
names of compounds, equations, problems, knowledge of chemical 
processes, and applications to manufacturing processes. 

The results of the analysis show the following: (1) that in view of 
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the effectiveness of each part it would be desirable to build additional 
items to replace those which are non-functional; (2) that for Part 4 
many problems should be tested at various levels of difficulty to per- 
mit the selection of a better distribution as to difficulty; (3) that 
Part 2 could very well remain unchanged except for the elimination of 
non-functional items; (4) that for Part 2 a more easily scored type 
of response would be desirable, provided this does not lower the pre- 


dictive power and reliability of the instrument; (5) that the content 
of the examination is well balanced and valid. 


EXPERIMENTAL PROCEDURES AND RESULTS 
Experimental forms were prepared as follows: 
Chemistry Training A 
Part 1A. True-false items dealing with fundamental processes 
as in Part 1 of the Iowa Placement Chemistry Train- 
ing Examination (45 items). 

Part 2A. A. Valence tested by giving the valence of one ele- 
ment or ion of a compound from which the val- 
ence of the other is to be derived; 

B. Writing formulas for compounds when a table 
of valences is given; 

C. Matching names of compounds with formulas; 
D. Completion and balancing of equations. 
(52 items in all sections). 

Part 3A. True-false items dealing with applications of proc- 
esses (48 items). 

Chemistry Training B 

Part 1B. Same as 1A, Form A. 

Part 2B. Same as 2A, Form A. 

Part 3B. Same as 3A, Form A. 

Chemistry Problems. Twenty-one problems graded from easy 

to difficult. 


In the construction of the new sections various high school and 
college textbooks in Chemistry were consulted. The aims were: (1) 
to adequately sample the field; (2) to include the topics which are 
emphasized at both the high school and college levels; and (3) to 
select from the topics taught in high school those which are of greatest 
importance for success in college courses. The third point was based 
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upon the judgment of a college instructor of freshman chemistry who 
has devoted many years to a study of this problem and related prob- 
lems. 

The textbooks consulted most frequently were: (1) Black and 
Conant (4); (2) Gray, Sandifur and Hanna (19); (3) Holmes (24); 
(4) McPherson and Henderson (37). 


The results for these forms are given in the tables which follow. 


The inter-part correlations (Table 19) show that the various sec- 
tions are not closely related. The coefficients of greatest importance 
are for Part 2 with the remaining parts. These indicate that the re- ` 
lationship is not very high and no higher than the coefficients between 
Part 2 and Parts 1 and 3 of the placement examination (Table 15). 
Thus the type of response employed in Part 2 of the new forms does 
not bring about a greater similarity as to the functions measured. 


In reliability (Table 20) Parts ! and 3 are somewhat lower than 
the similar parts of the placement examination. As the most effective 
items of these parts are to be combined with the best in the place- 
ment examination, highly reliable sections are assured. The new Part 
2 is more reliable than Part 2 of the placement examination. The 
chemistry problems part (new) is somewhat more reliable than the 
related part of the placement test. This, however, can be explained 


as due to the increased length of the new part. 


The means and standard deviations (Table 21) show that Forms 
A and B are but roughly equivalent, the greatest difference being 
6.75 between the means of 3A and 3B, with the remaining parts agree- 
ing within less than 1.5 points. 


Parts 1 and 3 of both new forms are fairly effective in the prediction 
of first semester grades and when the best items are combined with 
the best items of the placement test the parts which result should be 
more effective in this respect than either of the original parts. 


Part 2, new, employing a more easily scored response than Part 2 
placement, is very effective for grade prediction. The problem test 
is also effective for grade prediction. The total scores for Form A 
and for Form B correlate .620 and .579 with grades respectively, 
indicating moderately high effectiveness. However, when the ineffective 
items are eliminated and the problem section is included, the resulting 
forms can be expected to perform better than the original forms, 
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TABLE 19 
INTER-PART CORRELATIONS 
Part 1А Part 2A Part 3A 
Form А (N — 177) x 
Part 1A un 518 567 
Part 2A 518 => .383 
Part ЗА :567 .383 --- 
Part 1B Part 2B Part 3B 
Form B (N — 229) 
Part 1B — 542 561 
Part 2B 542 >ч А43 
Part 3B 561 443 lee 
TABLE 20 
RELIABILITY COEFFICIENTS OF PARTS 
Form А (N = 183) Form B (N = 238) 
Part 1A 796 Part 1B 4146 
Part 2A 956 Part 2B .932 
Рагі ЗА 752 Part 3B 856 
Chemistry Problems (N — 213) 
784 
TABLE 21 
CORRELATION WITH FIRST SEMESTER CHEMISTRY GRADES MEANS AND STANDARD 
DEVIATIONS 
Standard 
r P.E. Mean Deviation 
Form А (N = 177) 
Part 1A .562 034 23.559 8.901 
Part 2А 602 032 40.662 8.406 
Part 3А .342 045 17.610 8.952 
Total 620 031 81.389 21.294 
Form В (N — 229) 
Part 1B А71 034 22.152 8.238 
Part 2B 527 932 40440 7.602 
Part 3B 458 035 24.366 8.964 
Total 579 030 87.150 20.398 
Chemistry Problems (N — 211) 
.562 .030 13.830 3.483 


The results of the item analysis (Table 22) indicate that from the 
new Parts 1 and 3 there are 91 effective items available to combine 
with the effective items from Parts 1 and 3 placement. There are 
additional items which are almost three times their probable errors 
which may be employed. The new Parts 2 show a larger proportion 
of effective items than Part 2 of the placement examination and in 
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TABLE 22 


NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT 
FOR THE A-B AND D-F CROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 


No.3 or more 


Total Items Times P. E.’s 
Form А 
Part 1 45 26 
Part 2 52 40 
Part 3 48 17 
Form B 
Part 1 45 25 
Part 2 52 35 
Part 3 48 23 
Chemistry Problems 
21 15 


addition the new parts are effective for the prediction of grades. This 
would seem to indicate that the type of response employed in the new 
Part 2 both in Form A and Form B does not lower the effectiveness of 
the test and provides a section which can be more rapidly and easily 
scored. The new problems section provides fifteen effective items. 
By selecting the best problem items an effective and reliable section 
is assured. 

The conclusions which the above results would seem to justify are: 
(1) that the new Forms A and B are effective for tests of this type 
for the prediction of grades; (2) that when, for new Parts 1 and 3 
the most effective items are combined with the most effective items 
in the placement examinations, the resulting sections will be highly 
reliable and superior to the original parts for grade prediction; (3) 
that the type of response employed in Part 2, new, does not decrease the 
predictive power of the part in comparison with Part 2 of the place- 
ment test. The new Part 2 is more reliable, more easily scored, and 
does not correlate highly with the remaining parts; (4) that a problem 
section based on the new form retaining only the most effective prob- 
lems will give a section which is better scaled as to difficulty, is re- 
liable, and is effective for grade prediction. 


CHAPTER V ` 


ENGLISH APTITUDE AND ENGLISH TRAINING ANALYTICAL 
AND EXPERIMENTAL RESULTS 


ENGLISH APTITUDE 
ANALYTICAL PROCEDURES AND RESULTS 


The four parts of the English Aptitude Examination are as follows: 

Part 1. The student is given a rule taken from an English text- 
book, together with samples of applications of the rule. Additional 
sentences are given to be marked R (right) or W (wrong) as to their 
correct or incorrect application of the rule. There are two rules each 
followed by ten sentences. Time, eight minutes. 

Part 2. A passage of compact material is quoted from a college 
textbook of English. The student is required to check which of three 
Statements best expresses the related idea in the quoted material. 
There are ten items. Time, eight minutes. 

Part 3. A passage of poetry is given employing the Iowa Compre- 
hension Test method for the measurement of the comprehension of the 
passage. There are fifteen items. Time, twelve minutes. 

Part 4. "Twenty sets of three statements each are given. The best 
statement in each set is to be checked in accordance with a brief 
theme which accompanies them. "Time, fifteen minutes. 

In general the central aim is to present the student with rules to 
apply, a passage to be read for facts, applications, etc. No training is 
assumed except that of ability to read and understand college text 
materials. A student with poor English training but with good ability 
in reading and in the application of ideas gained from reading could 
get a high score. 

The results of the application of the partial and multiple correlation 
and regression techniques to the sample employed in this investigation 
are given in Table 23. 

The multiple coefficient .752 indicates that the test is an effective 
predictive instrument. Since the zero order coefficient .507 obtained 
for this sample is so far below the multiple coefficient, the present 


40 
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TABLE 23 


PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN ENGLISH OF THE VARIOUS PARTS 
OF THE ENGLISH APTITUDE EXAMINATION (REVISED) A 
(MULTIPLE CORRELATION AND REGRESSION WEIGHIS) 


ENGLISH APTITUDE REVISED A (N — 238) 
Variables 
Variable _-First semester English grades 
Variable --Score Part 1 
Variable --Score Part 2 
Variable --Score Part 3 
Variable 5---------------------------------- Score Part 4 


Variable 2 3 4 5 AM. Sigma 
1 312 -503 423 686 3.86 1.105 
2: .290 274 241 7.03 5.347 
3 .328 461 7.87 2.229 
4 242 12,39 2,694 
5 14.10 2.403 


Multiple Coefficient 
Rios = 752 
Regression Cocfficients and Equations 


b12.345 015 Deviation form raw scores 
b13.245 .079 xi = .015x2 + .080xg + .090x4 + .249х5 
b14,235 .090 Deviation form z scores 
b15,234 249 xi = 073z2 + 16123 + .21924 + .54225 


weightings of the parts of the test need to be changed. The re- 
gression coefficients show that Part 1 is making the least contribution 
to the prediction of grades. It is, however, probably worth retaining, 
especially as no more promising sections are available. Part 4 is 
contributing most to prediction with Parts 3 and 2 following in the 
order named. 


The difference in average z scores between the A-B and the D-F 
groups shown in Table 24 is 1.28 for the whole test with 10.03 per cent 
of D-F students reaching or exceeding the mean of the A-B students. 
While the individual parts do not rank high in discriminatory power, 
the fact that they correlate low with each other makes for a fairly 
large net contribution for each part. This is reflected in the rather 
high multiple coefficient. 
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TABLE 24 


DIFFERENCES BETWEEN AVERAGE Z SCORES OF A-B AND D-F GROUPS AND PERCENTAGE 
OF D-F GROUP REACHING OR EXCEEDING MEAN OF A-B GROUP 


Whole Test Parti Part2 Райз Part4 


Difference 1.28 90 88 66 96 
Percentage D-F reaching or exceeding 
mean A-B 1003 1841 1894 25.46 16.85 


The reliability coefficient for the whole test as given in Chapter 3 


is .82. The reliability coefficients for the various parts are shown in 
Table 25. 


TABLE 25 
RELIABILITY COEFFICIENTS FOR PARTS OF ENGLISH APTITUDE 
REV. А (N = 100) 


Part 1 Part 2 Part 3 


Part 4 
77 E 


69 40 


TABLE 26 
NUMBER OF ITEMS SHOWING DIFFERENCES IN MEAN PERCENTAGE CORRECT BETWEEN 
THE A-B AND THE D-F GROUPS WHICH ARE THREE OR MORE TIMES 
THEIR PROBABLE ERRORS 


No. 3 or more No. 4 or more 

Total Items Times P. E.'s Times P. E.’s 
Part 1 20 12 8 
Part 2 10 6 5 
Part 3 15 5 2 
Part 4 20 10 3 


The results of the item analysis, Table 26, indicate that Parts 1 and 
2 have a larger proportion of effective items than the remaining two 
parts. Part 3 has but five items which show statistically significant 
differences in mean per cent correct between the A-B and the D-F 
groups. Many of the items in Part 3 are very easy, nine of the fifteen 
items being answered correctly by 80 per cent or more of the D-F 
students. Fifty per cent of the items in Part 4 are effective. For 
Parts 1, 2, and 4 additional effective items are needed. The reading 
materials in Part 3 are probably too easy for the college level. A 
similar but more difficult section is suggested. 


Inasmuch as the only objective criterion f 


or an aptitude test is 
that of predictive pow 


er it would seem that this examination meets 
this criterion well and that a revision should proceed along lines of 
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improving existing parts except for Part 3, for which a similar but 
more effective and reliable section should be substituted. Part 4 should 
be made more reliable. 


EXPERIMENTAL PROCEDURES AND RESULTS 

For Parts 1, 2, and 4 additional items were prepared, based on the 
same materials as in the placement tests. This was to provide in the 
case of Part 1 sentences all of which would effectively discriminate 
between good and poor students. The aim was similar for Parts 2 
and 4. 

Part 3 was eliminated and three selections tried out employing the 
same method of testing reading comprehension but of greater difficulty. 
Two were poetry and one a prose selection. 


The new forms are identified as follows: 
English Aptitude A-1 
Part 1. Additional rules as in Part 1 of E.A. Rev. A. 
Part 2. Additional items on same materials as for Part 2 of 
E.A. Rev. A. 
Part 3. Reading comprehension as in Part 3 of E.A. Rev. A, 
but more difficult. Poetry. 
English Aptitude B-1 
Part 1. Reading comprehension as in Part 3 of Form А-1 
above. 
Part 2. Reading comprehension as in Part 3 of Form А-1 
above except prose materials are employed. 
Part 3. Additional items for materials in Part 4 of E. A. 
Rev. A. 
The results found are given in the following tables: 


TABLE 27 
INTER-PART CORRELATIONS 


English Aptitude A-1 (N — 272) 


Part 1 Part 2 Part 3 
Part 1 2, 407 471 
Part 2 407 Ti 470 
Part 3 471 470 d 
English Aptitude B-1 (N — 263) 
Part 1 Part 2 Part 3 
Part 1 --- :596 448 
Part 2 596 ded 487 


Part 3 448 487 
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TABLE 28 
RELIABILITY COEFFICIENTS OF PARTS 
English Aptitude A-1 (N — 266) English Aptitude B-1 (N — 264) 
Part 1 .640 Part 1 856 
Part 2 773 Part 2 .876 
Part 3 777 Part 3 .603 
TABLE 29 


CORRELATIONS WITH FIRST SEMESTER ENGLISH GRADES 
MEANS AND STANDARD DEVIATIONS 


Standard 
r P.E. Mean Deviation 
English Aptitude A-1 (N — 272) 
Pati ' 507 .030 9.986 7.120 
Part 2 496 031 7.849 2.034 
Part 3 446 033 4.960 3.549 
Total 597 026 22.335 10.539 
Standard 
r P.E. Mean Deviation 
English Aptitude B-1 (N — 263) 
Part 1 430 034 7.202 3.442 
Part 2 459 033 15.460 3.641 
Part 3 372 036 8.464 2.032 
Total 492 931 31.197 7.626 


Тһе inter-part correlations, Table 27, show that none of the parts 
are closely related in function. The correlations with first semester 
grades for each part, Table 29, show that none of the parts is highly 
effective in predictive value, Part 1 of Form А-1 being the only part 
above .50. On the other hand all parts are moderately effective the 
lowest coefficient being .372. Due to the low inter-part correlations 
the total scores correlate well with grades for Form A-1 but less so 
for Form B-1. 


The reading comprehension sections, Part 3 of Form A-1 and Parts 
1 and 2 of Form B-1, are approximately equal in predictive power. 
The reliability coefficients (Table 28) are fairly high for these sec- 
tions, especially for Parts 1 and 2 Form В-1. Part 3 of Form B-1 is 
low in predictive power and low in reliability. Unless a combination 
of the best items from this part and Part 4 of the placement test gives 
increased reliability, further experimental work will be required to 
discover a better type of test. Part 1, Form А-1 is also low in reli- 
ability. However, as only the best items from this part are to be 
combined with the best items of Part 1 of the placement test, which 
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is fairly reliable, it would seem that a satisfactory section can be 


expected. 
The results of the item analysis are summarized in Table 30. 


TABLE 30 


NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN THE MEAN PERCENTAGE CORRECT 
FOR THE A-B AND THE D-F GROUPS WHICH ARE THREE OR MORE TIMES 
THEIR PROBABLE ERRORS 


English Aptitude A-1 English Aptitude B-1 
Total Мо. 3 times Total Мо. 3 times 
Items P. E's Items Р.Е. 
Part 1 30 18 Part 1 20 13 
. Part 2 10 8 Part 2 20 14 
Part 3 20 13 Part 3 12 6 


The results of the item analysis of the experimental forms indicate 
that the new materials prepared to improve Parts 1, 2, and 4 of the 
placement examination will provide a surplus of effective items. This 
will make possible the construction of revised sections made up entirely 
of such items. The new reading comprehension sections designed as 
possible substitutes for Part 3 of the placement examination are Part 
3 of Form А-1 and Parts 1 and 2 of Form В-1. The proportions of 
effective items in these parts are 65, 65, and 70 per cent respectively 
in the order named above. The part which they are designed to re- 
place contains but 33 per cent of effective items. In addition to con- 
taining more items which discriminate significantly between inferior 
and superior students these sections were shown above to be consid- 
erably more reliable than Part 3 of the placement test. 


The conclusions which seem justified from the above are: (1) that 
the new reading comprehension sections designed to replace Part 3 
of the placement examination are more reliable and effective than the 
part to be replaced; (2) that a combination of the best items and 
parts of the new forms with the present placement examination mate- 
rials should give a more effective instrument; (3) that Part 4 of the 
placement examination and Part 3 of the new Form B-1 which are 
based on the same reading materials may be retained because of their 
importance for prediction; reliability may be improved by selecting 
the best items from each to combine into a single final form; and (4) 
that further research should be concerned with the discovery of new 
materials which are more valid and reliable. 
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ENGLISH TRAINING 
ANALYTICAL PROCEDURES AND RESULTS 


The Iowa Placement English Training Examination consists of four 
parts as follows: 


Part 1. Spelling. Seventy-five words in a list of which twenty-five 
are misspelled. The student must correct these. Time, eight minutes. 


Part 2. Punctuation. Sixty sentences, some of which are punc- 
tuated correctly and some of which are not correctly punctuated. The 
sentences are to be marked R if correct and W if wrong. Time, eleven 
minutes. 


Part 3. English grammar. Sixty sentences to be marked correct 


(R) or incorrect (W). Common errors are stressed. Time, eleven 
minutes. 


Part 4. Forty-five items to determine whether or not the student 
can distinguish good, clear, emphatic sentences from weak, confused, 
or ridiculous sentences. Time, ten minutes. 


In considering the test as a whole it may be described as measuring 
the most important factors in English which can be tested by objective 
techniques. 


The results of the application of the partial and multiple correla- 
tion and regression techniques to the sample employed in studying 
the examination are given in Table 31. 


The low zero order coefficients for Parts 1 and 4 indicate that these 
parts are not very effective for the prediction of grades. This is also 
borne out by the low negative regression coefficient for Part 1 and the 
low positive coefficient for Part 4. The multiple coefficient of 422 is 
but little higher than the observed zero order coefficient of .410 for 
this sample. If possible the multiple coefficient should be greatly 
increased in the reconstruction. 


The differences between the average z scores for the A-B and for 
the D-F groups shown in Table 32 indicate that Parts 1 and 4 are 
not effective in discriminating between the superior and inferior groups, 
and that Parts 2 and 3 are high in discriminatory power as compared 
with other parts of the placement series. 
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TABLE 31 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN ENGLISH OF THE VARIOUS PARTS 
OF THE ENGLISH TRAINING EXAMINATION (REVISED) A 
(MULTIPLE CORRELATION AND REGRESSION WEIGHTS) 
ENGLISH TRAINING REVISED A (м = 255) 


Variables 


First semester English grades 


Variable s 
Variable -Score Part 1 
Variable -.Score Part 2 
Variable --Score Part 3 
Variable 5--------. --Score Part 4 
" Zero Order Coefficients 
Variable 2 3 4 5 АМ. Sigma 
1 445 340 .366 .249 3.851 1.106 
2 364 271 465 34.480 8.981 
3 ATI 344 28.288 12.098 
4 .352 23.306 13.224 
5 14.892 7.373 
Multiple Coefficient 
Rios = 422 
Regression Coefficients and Equations 
012.345 —.001 Deviation form raw scores 
bi3.245 018 xı = —.001x2 -+ .018x3 + .021x4 + .001х5 
b14.235 021 Deviation form z scores 
b15.234 901 zı = —008:; + 19723 + 2514 + 100725 


TABLE 32 
DIFFERENCES BETWEEN AVERAGE Z SCORES OF А-В AND D-F GROUPS AND PERCENTAGE 
OF D-F GROUP REACHING OR EXCEEDING MEAN OF A-B GROUP 
Whole Test Part1 Part2 Part3 Part4 


Difference 1.22 51 103 109 7 
Percentage D-F reaching or exceeding 
mean of A-B group 10.93 30.50 15.15 13.79 22.06 


The reliability coefficient for the entire test is .90. The coefficient. 
for each part is shown in Table 33. 


TABLE 33 
RELIABILITY COEFFICIENTS FOR PARTS OF ENGLISH TRAINING 
REV. А (N = 100) 
Part 2 Part 3 Part 4 


Part 1 
66 67 


84 88 


The item analysis of Part 1 (spelling) shows that of the twenty-five 
words to be corrected, eight gave differences between the percentage 
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correct for the A-B and the D-F groups which were three or more 
times their probable errors and four differences were four or more 
times their probable errors. The mean percentage correct for the words 
which did not discriminate significantly between the two groups was 
found to be 74.1. For those words which gave differences between the 
A-B and D-F groups which were at least three times their probable 
errors the mean percentage correct was 62.5 and for those giving dif- 
ferences at least four times their probable errors the mean percentage 
correct was 57.7. This would seem to indicate that the easier words 
are less useful in discriminating between these two groups. This as- 
sumption will be tested experimentally. 


For Part 2 (punctuation) twelve of the sixty differences between the 
percentage correct for the superior and the inferior grade groups are 
four or more times their probable errors and nineteen are three or 
more times their probable errors. There are seventeen items which 
show negative differences, all small, between the superior and infe- 
rior groups, i.e., the inferior groups secured a slightly higher percent- 
аре correct. Of these seventeen items fourteen are sentences which 
contain no errors, indicating that superior students while superior in 
detecting errors which are present, do no better or even less well than 
the inferior when encountering sentences which are correct. Of the 
nineteen items Showing significant differences in favor of the A-B 
group, thirteen are incorrect sentences indicating again that the A-B 
students do better in detecting errors than in k 
а correct sentence is correct. Of the twenty- 
Show small positive differences in favor of 
incorrect and nine are correct sentences, 
cent are so easy that from 80 to 95 per 
them correct. The above would seem ti 
struction of new items to replace the 
must be made to build the correct sentence 
of error and that new items involving thi 
sentences which were too easy should be 
difficult setting. As the distribution of 
in this part agree closely with Foster’s ( 
emphasis on important principles, 
are needed except for a slight redu 
ing the possessive case of nouns, 
ments and series of elements should 


nowing definitely that 
four remaining items which 
the A-B group, fifteen are 
Of this last group 66 per 
cent of the D-F group get 
o indicate that in the con- 
ineffective ones an effort 
5 free from any suggestion 
е principles tested by the 
given in a somewhat more 
punctuation situations tested 
15) punctuation count as to 
no marked changes in this respect 
ction in the number of items involv- 
Those involving independent ele- 
be increased in number. 
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The grammar section, Part 3, is the most effective part. Of the 
sixty differences in percentage correct between the superior and infe- 
rior grade groups twenty-five are at least three times their probable 
errors and sixteen are at least four times their probable errors. Of 
the above twenty-five items showing significant differences between 
the inferior and the superior students, nineteen are incorrect items 
and six are items which are correct. For the twelve items showing 
negative differences, Że., in favor of the D-F group, ten are correct as 
presented. As in Part 2, so for this part the A-B students seem more 
prone than the D-F students to mark a correct sentence as wrong. 
Further, as in Part 2, the A-B students are superior in detecting in- 
correct items. The use of “who-whom” causes considerable confusion 
for both groups. Of eight items involving these usages four gave 
low negative and three gave low positive differences. Similarly, for 
the ten items involving "subject-verb" relationship seven gave very 
low differences and three gave significant differences in favor of the A-B 
group. 

In Part 3 as in Part 2, the need seems to be for special care in the 
construction of the correct sentences to avoid giving irrelevant sug- 
gestions of error and to present as far as possible only clear cut 
grammatical situations. 

Part 4 attempts to measure sentence sense or judgment. Of the 
forty-five items but four show differences in percentage correct be- 
tween the A-B and the D-F groups which are significant, and of the 
remaining forty-one items a majority show very low positive or nega- 
tive differences. Foster (15, p. 131) in his investigation found, “that 
many of the items do not measure anything reliably. Gains for cer- 
tain items are as high as 26 or 28 per cent. Other items show an 
actual decrease in ‘percents right’ from September to May of 18 or 
19 per cent." In view of the above facts including the findings re- 
ported earlier in the chapter, it was decided to discard this section 
entirely and experiment with a new type of material. 

In brief, the analysis indicates that Part 1 is contributing little to 
prediction, in part probably because of too many easy items; that 
Parts 2 and 3 are valid and effective, and that Part 4 should be re- 
placed by some other type of valid and reliable test material. 


EXPERIMENTAL PROCEDURES AND RESULTS 
New spelling sections of two types were constructed. Three lists 
were made employing the same method as in the placement examina- 
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tion but much more difficult, and two lists made up of words of 
similar difficulty but employing the multiple-choice type of response 
were used. The words were selected from Master's (38) investigation, 
which lists 268 commonly misspelled words with the percentage of 
correct spelling found for college freshmen. In addition unpublished 
lists, furnished by Professor M. F. Carpenter of the University of 
Iowa and based on several years of testing of high school seniors, 
were employed. АП words used are found in the Horn (26) word 
list and are therefore words commonly needed in writing. 

For the punctuation and grammar sections two parts of forty items 
were constructed for each section. An effort was made to make each 
item contain no error or source of confusion, or to contain a clear- 
cut error for the situation to be tested. The purpose was to construct 
a large enough number of items so that a selection would give only 
highly effective units when the best items of the placement test and 
the new materials were combined. 

To replace Part 4 a test of ability to judge effective word usage 
was constructed. The words were taken from the second five thousand 
of the Horn (26) list. The same word was given in each of three 
sentences. In one of the sentences the use was correct and effective, 
in the others ineffective or incorrect. Two parts were made up of 
forty-two items each. This type of test had been found reliable and 
valid by Foster (15, pp. 135-154) and the materials employed were 
furnished by him. He had submitted the items to a number of expert 
judges and retained only those which were regarded by them as valid. 
The reliability coefficient reported for an eighty-four item test was 


947 and the correlation between three independent theme ratings and 
Scores on this test was .624 + 045. 


The forms and parts are identified as follows: 
English A-1 


Part 1. Spelling as in placement examination 
Part 2. Spelling multiple choice 
Part 6. Spelling as in Part 1 

English B-1 


Part 1. Spelling as in placement examination 
Part 2. Spelling multiple choice 
English A-2 


Part 3. Punctuation. Sentences to be marked as right or 


wrong as to punctuation. 
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Part 4. Grammar. Sentences to be marked as right or wrong. 

Part 5. Word usage. The student is required to indicate 
which of three sentences uses a given word most ef- 
fectively. 


English B-2 
Part 3. Same as Part 3, Form A-2 
Part 4. Same as Part 4, Form A-2 
Part 5. Same as Part 5, Form A-2 


The tables which follow show the results for these materials. 


TABLE 34 
INTER-PART CORRELATIONS ENGLISH TRAINING 
EXPERIMENTAL FORMS 


English A-1 (N — 247) 


Part 1 Part 2 Part 6 
Part 1 я лт 849 
Part 2 лт ЕЕ .830 
Part 6 -849 em == 


English B-1 (N — 269) 
Part 1 Part 2 


Part 1 “Бә :705 
Part 2 .705 E 


English A-2 (N — 211) 


Part 3 Part 4 Part 5 
Part 3 <= 442 417 
Part 4 442 == 519 
Part 5 417 519 62 


English В-2 (N = 216) 


Part 3 Part 4 Part 5 
Part 3 ee 518 382 
Part 4 518 e 466 


Part 5 382 466 m 
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TABLE 35 
CORRELATIONS WITH FIRST SEMESTER ENGLISH GRADES MEANS AND STANDARD 
DEVIATIONS 
Standard 
r P.E. Mean Deviation 
English A-1 (N — 247) 
Part 1 631 025 14.290 5.664 
Part 2 .607 027 43.658 5.874 
Part 6 635 024 14.832 5.628 
English B-1 (N — 269) 
Part 1 415 :034 13.660 5.008 
Part 2 446 033 42.968 5.826 
English A-2 (N = 211) 
Part 3 1525. 034 23.098 9.650 
Part 4 546 033 21.572 11.192 
Part 5 519 034 27.580 7.248 
Total Score 653 026 66.280 22.835 
English B-2 (ЇЧ — 216) 
Part 3 -503 034 19.695 11.082 
Part 4 569 031 17.319 10.815 
Part 5 538 .032 26.346 7.812 
Total Score 654 025 63.434 23.709 
TABLE 36 


CORRELATION WITH FIRST SEMESTER GRADES FOR CERTAIN PARTS OF IOWA PLACEMENT 
ENGLISH TRAINING REV. B AND EXPERIMENTAL FORMS В-1 AND B-2 (N 


Part 1. E. T. — Rev. B 
Part 1. Experimental B-1 
Part 2. Experimental B-1 
Part 4. E. T. — Rev. В 

Part 5. Experimental B-2 


English A-1 (N — 247) 
Part 1 
Part 2 
Part 6 

English А-2 (N — 211) 
Part 3 
Part 4 
Part 5 


TABLE 37 
PART RELIABILITY COEFFICIENTS 


855 
864 
892 


509 
814 
886 


English B-1 (N 


Part 1 
Part 2 


English B-2 (N 


Part 3 
Part 4 
Part 5 


= 74) 
PE. 
041 
:039 
046 
973 
063 


268) 
.891 
.856 


216) 
743 
768 
891 


The inter-part correlations for Forms А-1 and B-1, Table 34, are 


£ 
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high. As these are all tests of spelling this is to be expected. None 
of the correlations between parts of Form A-2 or Form B-2 are very 
high. 

The correlations of test scores with first semester grades given in 
Table 35 show that there is very little difference in predictive power 
between the multiple-choice method (Part 2) of testing spelling and the 
method used in the placement examination which is employed in Parts 
land 6. This is also true so far as the reliability coefficients are con- 
cerned (Table 37). 

The correlation coefficients for Forms А-2 and B-2 indicate that 
the two forms are about equally effective for prediction. As the pur- 
pose of Parts 3 and 4 is to secure additional effective items to be 
combined with the effective items of Parts 2 and 3 respectively of the 
placement examination, the item analysis to be summarized later will 
provide an additional basis for their evaluation. So far as prediction 
is concerned these new parts are quite effective. Part 3 (New) Form 
A-2 is lower in reliability than Part 3 of Form B-2. This can be 
improved by retaining only the effective and reliable items. Part 4 
experimental in both Forms А and B is far more reliable than Part 3 
of the placement test Form А, which it is intended to supplement. 
By combining the best items of both, a more valid and reliable section 
should result. 

Part 5 of the experimental forms is effective for the prediction of 
grades and is very reliable. When compared with Part 4 of the 
placement examination, Table 36, it is found to be much more effective 
for the prediction of grades and is far more reliable. 

Table 36 also permits a comparison with the new spelling section of 
the placement examination. For this sample the more difficult new 
section, Part 1 Experimental B-1, is but little more effective than the 
easier placement test. The multiple-choice technique is slightly less 
effective than the method employed in the placement test as shown 
by the somewhat lower correlation coefficients. 

A comparison of the relative merits of the multiple-choice and 
error correction methods of testing spelling reveals very little difference 
thus far. They are almost equal in predictive power and reliability. 
The multiple-choice test can be much more rapidly scored. On the 
other hand, the mean scores of the multiple-choice sections are roughly 
43 for each form, which means that on an average 43 of the 50 items 
are checked correctly. It seems to be an easier type of test than the 
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error correction, in which less than 60 per cent of the words on an 
average were given correctly. 


The results of the item analysis, Table 38, show that Parts la and 
6 of Form А-1 and Part 1 of Form B-1, which are tests of spelling 
employing the same method as the placement examination, contain 
88.0, 95.8, and 72.0 per cent respectively of effective items. ‘The 
spelling sections employing the multiple-choice technique, Part 2 Form 
A-1 and Part 2 Form B-1, contain 70 and 40 per cent respectively 


TABLE 38 
NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT 


FOR THE A-B AND D-F GROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 


No. 3 or more 


Total Items Times P. E.'s 

Form A-1 

Part 1 25 22 

Part 2 50 35 

Part 6 24 23 
Form A-2 

Part 3 40 23 

Part 4 40 28 

Part 5 42 30 
Form B-1 

Part 1 25 18 

Part 2 50 20 
Form B-2 

Part 3 40 28 

Part 4 40 30 

Part 5 42 34 


of effective items. The placement technique gives a much larger pro- 
portion of effective items and so far as these forms are concerned, it 
is more consistently effective than the multiple-choice technique. The 
decision, it would seem, must be in favor of the placement technique, 
inasmuch as the two types have been previously shown to be almost 
equal in other respects. 

The new punctuation sections, Part 3 of both Forms А-2 and B-2, 
Show 57.5 and 70 per cent respectively of effective items. The 
smaller proportion of reliable items explains in part the lower reliability 
of Part 3 Form A-2. Both of the new punctuation sections show a 
larger proportion of effective items than the placement punctuation 
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section, which contains 31.6 per cent of such items. The marked in- 
crease in proportion of effective items indicates that the new sections 
have overcome some of the defects of the placement section which were 
brought out in the analysis above. 

The placement grammar section contains 41.6 per cent of effective 
items. The new grammar sections, Part 4 of both Forms A-2 and 
B-2, show 70 and 75 per cent respectively of such items. This also 
indicates that some of the defects found in the analysis of the place- 
ment section have been overcome. 

The word usage tests, Part 5 of Forms A-2 and B-2, show 71.4 
and 80.9 per cent respectively of effective items. Part 4 of the place- 
ment examination which this part is designed to replace contains 
but 8.8 per cent of effective items. Part 5 (new) is from every point 
of view studied here greatly superior to Part 4 of the placement 


examination. 

The conclusions which seem warranted by the above results are: 
(1) that the experimental parts are all fairly high in effectiveness 
for the prediction of grades; (2) that the types of spelling tests em- 
ployed are about equally effective for the prediction of grades; that the 
placement. technique is slightly more reliable and shows a larger pro- 
portion of effective items; (3) that Part 5 (experimental) is more 
valid and reliable than Part 4 (placement) for which it is to be sub- 
stituted; (4) that a combination of the best items and parts of the 
Iowa Placement English Examination and the new materials will 
give an instrument which is more effective and valid than the present 


placement test. 


CHAPTER VI 


FRENCH AND SPANISH TRAINING AND FOREIGN LAN- 
GUAGE APTITUDE: ANALYTICAL AND 
EXPERIMENTAL RESULTS 


FRENCH TRAINING 
ANALYTICAL PROCEDURES AND RESULTS 


The French Training Examination consists of four parts. Part 1 
is a vocabulary test of 60 items based on the Henmon word count 
(time, 10 minutes). The recall type of response is employed, i.c., the 
French word is given and the directions ask that the English equiva- 
lent be written on the line Opposite the word. Part 2, 8 test. of 
grammar, consists of 40 items (time, 10 minutes). The essential points 
of grammar are tested by requiring the student to recognize and under- 
line an error in a sentence and to write the correct response on a line 
to the right of the Sentence. Part 3 is a test of verb usage consisting 
of 40 items (time, 10 minutes). The response employed is the mul- 
tiple-choice, in which the student indicates which of four responses 
given is the correct one. The last part is a test of French reading 
comprehension, in which three paragraphs of increasing diffculty are 
presented, followed by questions (in English) which are to be an- 


swered in English. The responses require the writing of a word or 
short phrase. 


In considering the examination as 
which it was intended, it seemed 
jective in nature in Parts 1, 
not decrease the predictive po 
permit more rapid and accurat 


a whole and the purpose for 
that a type of response more ob- 
2, and 4 could be found which would 


uring a sample to be analyzed and the 
standard deviation of the sample with 
in Chapter III. 


the application of the multiple corre- 
56 


norms have already been given 
Table 39 gives the results of 
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lation and regression techniques to the problem of determining the 
relative contribution of each part to the prediction of grades. 

The multiple coefficient. found, .527, indicates that the examination 
is functioning about as well as is usually found for the better tests 
of this type. 

The regression coefficients with scores expressed as z scores indicate 
that the largest relative contribution is made by the reading compre- 
hension section (Part 4), with grammar (Part 2) next, verbs (Part 
3) third and with vocabulary (Part 1) adding nothing that is not 


TABLE 39 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN FRENCH OF THE VARIOUS PARTS 
OF THE FRENCH TRAINING EXAMINATION (REVISED) A 
(MULTIPLE CORRELATION AND REGRESSION COEFFICIENTS) 
FRENCH TRAINING REVISED A (N — 200) 


Variables 
Wartable tt. een илсе ен эшенә ныне» First semester French grades 


--Score Part 1 
--Score Part 2 


Variable 2-- 
Variable 3-- 


Variable 4-- -Score Part 3 
Variable 5-------------- -Score Part 4 
Zero Order Coefficients 
Variable 2 3 4 5 АМ. Sigma 
1 386 441 378 A87 419 1.133 
2 .588 .555 713 12.95 4.041 
3 685 .580 12.11 7.101 
4 .525 1045 8.943 
5 28.25 13.149 
Multiple Coefficient 
Rı1.2345 = .527 
Regression Coeficients and Equations 
b12.345 —.004 Deviation Form Raw Scores 
b13.245 033 xı = —.004x2 + .033x3 + .008х4 4 .030x5 
bis.235 .008 Deviation Form z Scores 
bis,234 030 zı = —.014z2 + 20723 + 106324 + .34825 


already cared for by the other parts. This, however, does not mean 
that the vocabulary section should be omitted as it would be necessary 
in any study of specific deficiencies. 

Another approach to determine the discriminatory power of each 


part as well as the entire examination consisted of a determination 
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of the difference between the average z scores of the A-B and the 
D-F groups and also the percentage of D-F students reaching or ex- 
ceeding the mean of the A-B students. 


TABLE 40 
DIFFERENCES BETWEEN AVERAGE 2 SCORES OF A-B AND D-F GROUPS AND PERCENTAGE 
OF D-F GROUP REACHING OR EXCEEDING MEAN OF A-B CROUP 


Whole Test Parti Part2 Part3 Part4 


Difference 1.33 1.21 1.05 93 119 
Percentage D-F reaching or exceeding 
Mean A-B 918 — 1131 1469 1762 11.70 


The z score difference for the whole test is exceeded by but one other 
test in Iowa Placement Examination Series, indicating that its power 
to discriminate between A-B students and D-F students is high for 
tests of this type. The individual parts also rank high in this respect. 
Less than half of the part values for all the examinations show а 2 
Score difference larger than 1.00, and forty per cent of the z score 
differences are below ‚90. 

Table 41 gives the reliability coefficient of each part of the exam- 


ination. The reliability coefficient of the whole test as reported above 
is .93, 


TABLE 41 
RELIABILITY COEFFICIENTS FOR EACH PART FRENCH TRAINING 
REVISED A (N = 200) 
Part 1 Part 2 Part 3 Part 4 
83 87 94 .86 


The test as a whole and for each part compares very favorably with 
the best existing educational tests so far as reliability is concerned. 

The words for the vocabulary test were selected from the Henmon 
(23) word count and apparently a sampling was made through the 
entire list. 

In comparing the discriminatory power of each word with its fre- 
quency of occurrence the New French word count (41) was employed 
because of its greater validity. Items 15, 42, 45, 48, 51, 54, and 57 
do not occur in the first 6,000 words in frequency and range and are 
therefore of doubtful validity in this type of examination. Thirty- 
three of the 60 words range in frequency from 2,000 on up, leaving 
less than half the list employing the more frequently used words of 
the language. The relationship between the power of an item to 
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discriminate between A-B students and D-F students was studied to 
find out whether or not the more common or the less common words 
were most effective. The results are shown in Table 42. 


TABLE 42 
MEAN PERCENTAGE OF DIFFERENCE BETWEEN ITEMS CORRECT FOR A-B STUDENTS ANT 
D-F STUDENTS AS RELATED TO FREQUENCY OF OCCURRENCE 


Word Frequency Number Mean 

of Words Difference 
First Thousand 1 18.18 
Second Thousand 16 18.43 
Third Thousand 12 9.58 
Fourth Thousand 9 11.66 
Above Fourth Thousand 12 13.75 


An examination of the words above the third thousand in frequency 
demonstrated that practically every word in these groups which dis- 
criminated well between A-B and D-F groups was rather similar to 
its English equivalent and further that they were not words commonly 
employed by beginning college students. These words are: musculaire, 
paternel, l'affirmation, l'acceptation, and intact. It seems reasonable 
to conclude that the superior students infer the English equivalent of 
these words more readily than the inferior students because of a supe- 
rior English vocabulary. As some of these words do not occur at all 
in the 85 sources employed in the word count, it does not seem probable 
that many students would have met the words prior to the examination 
and therefore instead of testing French Training such words are meas- 
ures of language aptitude or intelligence. If these words are excluded 
in the computation of values for Table 42 above, the mean difference 
for the fourth thousand drops to 6.42 and for the group above four 
thousand the mean difference drops to 8.33. The Pearson correlation 
coefficient between percentage of difference between the A-B and the 
D-F groups and word frequency was found to be —.192 + .083. 

It seems reasonable to conclude that in general the less common 
words, defined as those above the first two thousand in frequency of 
occurrence, are less effective than the more common words of the 
language. This probably would not apply to exceedingly common 
words, such as those in the first one hundred in frequency. There is 
one of these in the list and it has but fair discriminatory power. 

In the interpretation of the differences between tbe percentages cor- 
rect for the A-B and D-F groups, the probable errors of the differences 
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between the proportions must be considered. Thus for the number 
of cases upon which the proportions are based for Part 1 of the French 
Training test, the highest probable error of the difference would be 
roughly six per cent. Considering a difference which is three times its 
probable error significant for the reasons enumerated in Chapter III, 
then any difference of eighteen per cent or over is significant. Many 
of the differences smaller than this are Significant; e.g., item 1 with 
а difference of 13.9 per cent between the A-B and 


D-F groups is slight- 
ly over three times its Probable error of 4.22. 


It is of interest to 


— 
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significant, and in the case of sentences 13 and 18 even when the 
error was corrected the sentences would still not be well stated. 

A comparison of the percentage of correct responses for each item 
for the A-B and D-F groups showed that 20 of the differences between 
the A-B and D-F percentages were at least 4 times their probable 
errors and that 24 of the 40 differences were 3 or more times their 
probable errors, indicating the effectiveness of 60 per cent of the items. 


In general Part 2 may be described as reasonably valid from the 
point of view of content and very effective in discriminating between 
the A-B and D-F students. 

As in Part 1, so in Part 2 no conflict was found between content 
validity and predictive validity. A section stressing the most im- 
portant principles of grammar is effective as a predictive measure. 


The average z score of the A-B group and of the D-F group and 
the z score difference for the verb test, Part 3, indicated that it was 
not as effective in discriminating between the superior and inferior 
students as were the other parts of the test. The regression coefficient 
in z score form was less than one-third that of the grammar section 
and less than one-fifth that of Part 4. This does not imply that the 
verb test is not useful, and it must be pointed out that this discrimina- 
tion as a whole ranked very high among the tests of this series. Ac- 
cordingly a part may be somewhat lower in discriminatory power 
when compared with other parts of the same test and yet show power 
above the average of the entire series. 

A consideration of the detailed item analysis for Part 3 revealed 
that 16 of the 40 items gave differences between the percentage cor- 
rect for the A-B group and the D-F group which were at least four 
times their probable errors and 22 which were at least three times 


their probable errors. 

The major defect found in this section was that of content validity. 
It would seem that a valid verb section from the point of view of 
content should contain items emphasizing in proper proportions the 
most important conjugations, tenses, persons, and irregular and regular 
verbs. The test under consideration employed only common regular 
and irregular verbs. The persons are, in order of emphasis, third 
singular 16 items, third plural 7 items, first singular 6 items, first and 
second plural 3 items each, and second singular 1 item. Four items 
involve no test of persons, dealing with uses of infinitives. It would 
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seem that there might be fewer third person singular items with slight 
increases in first and second person plural and third person plural. 
In the absence of an objective criterion, however, such criticisms 


must be given with caution. The tense distribution was found to be 
as follows: 


Tense No. of Items 
Present indicative 


Imperfect 

Past definite 

Past indefinite 
Future 

Conditional 
Conditional perfect 
Infinitive 

Present subjunctive 
Imperfect 
Pluperfect 


he found constituted ab 
in importance were: р 
conditional perfect, imperative 
definite was important only in readi 


— 
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uses of the infinitive. Of 22 differences which were at least three 
times their probable errors but one was an uncommon tense. Of the 
four dead tense items in the test, three gave practically no difference 
between the superior and inferior groups. Of 36 items involving com- 
mon or fairly common tenses, 21 gave differences which were at least 
three times their probable errors and 14 gave differences which were 
at least four times their probable errors. The extreme emphasis on 
the present subjunctive was not found to be justifiable on grounds of 
predictive power as none of these items was high in predictive power 
and half of them were ineffective. 

The conclusion that the more commonly used tenses are as effective 
as or more effective than the less common seems reasonable in the 
light of the above. It would seem that a verb section of valid content 
as an achievement test would at the same time be more effective as a 
placement test than one composed of the infrequently used verb ele- 
ments. 

The fourth part of the French Training examination was found to 
be very effective. It is made up of three paragraphs taken from stand- 
ard French works. No question was raised by the authorities exam- 
ining this part as to its content validity. 

The difference between the average z score of the A-B group and 
of the D-F group (Table 40) revealed marked power of discrimination, 
The regression coefficient (Table 39) indicated that for predictive 
purposes this part was the most important of the four parts of the 
examination. Of the twenty items, fourteen show significant differ- 
ences between the mean percentage correct for the A-B group and for 
the D-F group. 

The desirability of a more objective type of answer, however, 
ed in the experimental procedures to be described later. 


was 


recogniz 
EXPERIMENTAL PROCEDURES AND RESULTS 
minations were prepared in an attempt to apply 
led by the above analysis. As it was desired to 
the reconstructed series, all materials were pre- 
in equivalent duplicate forms. Items for the 
vocabulary sections were selected from the new French wordbook (41), 
referred to above. Two sets of 60 items were prepared by selecting 
every twenty-fifth word in the list, not including those so commonly 
used that they were within the first two hundred of the list. Each 
list of 60 words was set up in two forms, one giving only the French 


Experimental exai 
the principles reveal 
have two forms in 
pared so far as possible 
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word with instructions to write the English equivalent after it, the 
other presenting the same words with five words following, only one 
of which was an English equivalent. The question raised was, Is the 
recall or multiple-choice method of testing more effective for the pre- 
diction of first semester grades? If equally so, then it would seem 


that because of its greater objectivity the multiple-choice technique 
would be preferable. 


A new grammar section of 40 item: 
employing only the most common princ 
items were presented as in the Iowa 
and with the multiple-choice response. 


S was prepared in two forms 
iples. For each form the same 
Placement French Examination 
Idioms were omitted from the 


the Modern Language Investi- 
t common idioms were used in 


The multiple-choice type of res 
Series was retained for the experim 
were prepared by building sente 
regular and irregular verbs. 


ропзе employed in the placement 
ental verb test. The new materials 
nces employing the most common 
The persons and tenses were included 
The aim was to build a valid verb 
onjugations, persons, tenses, auxil- 
erbs. 

ployed in the placement series was tried 
tems each giving a verb form followed 


but one of which was correct. This sec- 
tion stressed important irregular verbs, 


paragraphs used in the plac 
of response. Instead of ask 
to questions based on each 


nding of an 
entire sentence or even the whole paragraph. This method was com- 
pared with that employed in the placement test, 
The experimental forms consist of parts as follows: 
French A-1 and French B-1 
Part 1. Vocabulary multiple choice, 60 items 
Part 2. Grammar multiple choice, 40 items 
Part 3. Idioms, completion, 20 items 
Part 4. Reading comprehension, true-false, А-1, 


36 items 
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B-1, 38 items 

French А-2 and French B-2 
Part 5. Vocabulary recall type, 60 items 
Part 6. Grammar error correction, 40 items 
Part 7. French verb forms, multiple choice, 31 items 
Part 8. Verbs, French to English, multiple choice, 25 items 


The items for Parts 1 and 5 for Forms А-1 and А-2 and also for 
Forms B-1 and B-2 are identical, except for the type of response, as 
indicated above. The same is true of Parts 2 and 6. In each case 
the purpose was to compare recall or error correction response types 
with the multiple choice method of testing. 


Where techniques are to be compared results are given in the tables 
below for two groups, in each case a larger group and a smaller group. 
In the case of the smaller groups, the results compared are based on 
the same students. To eliminate practice effects, half of the group 
were given the parts employing one technique the first day, the other 
half being tested with the other technique to be compared. At the 
next class period, two days later, these were reversed. For these 
groups results are also available on French Training Form B of the 
placement series. In the case of the larger groups of 171 or more, the 
results to be compared are not based on the same students. Thus, 
one set of results will serve as a check on the other. 


The inter-part correlations based on the larger groups are given in 
Table 43. The correlation of each part and of the whole test with 
first semester French grades is given in Table 48. 


The inter-part correlations are very similar to those for the Iowa 


Placement French Examination shown in Table 39. The rather high 
correlations between grammar, Part 2, and idioms, Part 3, Form 
A-1, would indicate that the idiom section tests somewhat the same 
functions as the grammar section. This relationship, however, is not 
so high in the case of Form B-1. The high correlation in both Form 
А-1 and B-1 between the grammar and verb sections indicates that 


these two functions are somewhat similar. 
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TABLE 43 
INTER-CORRELATIONS OF PARTS 


French A-1 (N = 171) 


Part 1 Part 2 Part 3 Part 4 
Part 1 26 517 573 549 
Part 2 317 = 783 2% 
Part 3 573 783 -— 542 
Part 4 549 546 542 = 
French A-2 (N = 227) 
Part 5 Part 6 Part 7 Part 8 
Part 5 emet 569 645 480 
Part 6 569 as 809 481 
Part 7 645 .809 Fel 573 
Part 8 481 481 573 --- 
French B-1 (N = 234) 
Part 1 Part 2 Part 3 Part 4 
Part 1 —-- .588 .537 477 
Part 2 588 а: 594 341 
Part 3 537 594 === 702 
Part 4 477 341 702 --- 
French B-2 (N — 187) 
Part 5 Part 6 Part 7 Part 8 
Part 5 m 646 261 636 
Part 6 646 2 795 528 
Part 7 261 195 EN 639 
Part 8 636 528 639 --- 
TABLE 44 


CORRELATIONS BETWEEN CERTAIN PARTS OF IOWA PLACEMENT FRENCH B REV. AND 
SIMILAR PARTS OF EXPERIMENTAL FORM B (N — 43) 


r . PE. 
Exp. grammar error-correction vs. I. P. E. grammar 746 045 
Exp. grammar multiple-choice уз. I. Р. E. grammar 739 046 
Exp. verbs multiple-choice vs, I. P. E. verbs 655 057 
Exp. read. comp. true-false vs. I. P. E. read. comp. 718 048 


comprehension. As the Iowa Placement Verb section was not as ef- 
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fective nor as valid as the new verb materials, the somewhat lower 
coefficient between these two parts may be in favor of the newer part. 


For the smaller groups which were given both A-1 and А-2 or В-1 
and B-2, inter-part correlations (Table 45) were found only between 
parts of A-1 and parts of A-2 or B-1 and B-2, as it was not necessary 


to duplicate those given in Table 43. 


TABLE 45 


INTER-CORRELATIONS OF PARTS OF RELATED FORMS 


French A-1 and French A-2 (N — 84) 


Part 5 Part 6 Part 7 
Part 1 .682 350 522 
Part 2 .363 778 .738 
Part 3 ipsa — 677 
Part 4 [m E 446 

French B-1 and French B-2 (N — 81) 

Part 5 Part 6 Part 7 
Part 1 .708 .518 .535 
Part 2 .581 818 -701 
Part 3 === — 616 
Part 4 — 285 568 


The inter-part correlations in Table 45 above show that the sections 
designated as measures of the same functions but employing different 
types of responses, while moderately related, do not test exactly the 
same thing. Part 1, vocabulary, multiple-choice, and Part 5, vocabu- 
lary recall, correlate .682 in Form A and .708 in Form B, indicating 
that the two techniques are not measuring the same thing, as the 
prediction of one score from another would contain a wide margin 
of error. A similar statement may be made for the relationship be- 
tween Part 2, grammar error correction, and Part 6, grammar multiple- 
choice except that the relationship between the two techniques is 
considerably higher. The inter-correlations between the remaining 
parts indicate a lower relationship except for the grammar and verbs 


section. 

The correlations for each part and for various combinations of parts 
with grades are given in Table 46 for the larger groups and in Table 
47 for the smaller groups. The means and standard deviations are 
included for the larger groups. 


68 IOWA STUDIES IN EDUCATION 


TABLE 46 


MEANS, STANDARD DEVIATIONS, AND CORRELATION COEFFICIENTS WITH FIRST SEMESTER 
GRADES 


French А-1 (N — 171) 


7 with Standard 

Grades P.E. Mean Deviation 
Part 1 .632 030 44.559 8.421 
Part 2 .622 031 27.546 6.864 
Part 3 568 035 6.562 3.561 
Part 4 560 035 20.508 7.818 
Total 734 923 98.935 21.993 


French А-2 (N — 227) 


r with Standard 

Grades P.E. Mean Deviation 
Part 5 623 .027 39.317 7.308 
Part 6 584 029 20.128 9.226 
Part 7 651 025 21.610 5.342 
Part 8 471 9035 17.725 3.708 
Total 631 026 99.233 22.106 


French B-1 (N = 234) 


r with Standard 

Grades P.E. Mean Deviation 
Part 1 557 030 44.397 8.172 
Part 2 558 .030 28.115 7.347 
Part 3 557 .030 5.812 3.123 
Part 4 374 038 19.654 8.925 
Total .638 026 98.239 21.747 


French B-2 (N = 187) 


r with Standard 

Grades P.E. Mean Deviation 
Part 5 522 036 39.819 8.115 
Part 6 543 034 18.948 8.334 
Part 7 .540 034 20.638 5.784 
Part 8 510 036 17.846 4.308 


Total 595 931 99.008 23.443 
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TABLE 47 


CORRELATION COEFFICIENTS WITH FIRST SEMESTER GRADES, FORMS А-1 AND A-2 AND 
B-1 AND B-2 GIVEN TO SAME GROUP 


| French А-1 апа А-2 (N — 84) 
Standard 


r P.E. A.M. Deviation 

(A-1) 
Part 1 .537 952 43.501 9.186 
Part 2 541 051 24.642 5.196 
Part 3 555 050 4.690 3.003 
Part 4 .569 .050 19.214 7.550 

(A-2) 
Part 5 .600 047 37.572 7.214 
Part 6 .539 .052 14.429 8.154 
Part 7 554 050 18.690 5.912 
Part 8 531 052 17.120 4.128 
Sum of 1+2+4+7 694 037 106.582 22.694 
Sum of 4+5+6+7 .708 :037 89.081 22.946 

French B-1 and В-2 (N = 81) 

Standard 
r P.E. A.M. Deviation 

(B-1) 
Part 1 4443 060 43.536 8.526 
Part 2 689 .039 25.192 6.372 
Part 3 503 054 4432 2.780 
Part 4 389 062 21.014 9.086 

(B-2) 
Part 5 569 050 37.755 8.802 
Part 6 .625 045 16.192 8.328 
Part 7 676 039 18.574 5.829 
Part 8 522 053 16.896 4.622 
Sum of 1 +2+4+ 7 5413 246 107.996 23.681 
047 94.156 27.312 


Sum of 4 4- 5-6 2-7 40% 


the larger groups, Table 46, show 


that for these samples the tests as a whole and in part are effective 
in predicting first semester grades. Form А-1 and Form В-1, which 
were built as nearly as possible as equivalent forms, correlate with 
grades as shown, ‚734 and .638 respectively. These correlations are 
based on totally different groups. Similarly, Forms A-2 and B-2 cor- 
velate with grades, .631 and .595 respectively. That Forms A-1 and 
В-1 are nearly equivalent in whole and in parts is shown by the 
approximate agreement of the means and standard deviations. 


The correlations with grades for 
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Tn order to determine the relative effectiveness of various techniques 
of measuring a particular function, the results obtained on the larger 
but separate groups were compared with those for the smaller groups 
Shown in Table 47. These smaller groups were given both Forms A-1 
and А-2 or Forms B-1 and B-2. That the smaller groups are reason- 
ably representative is shown by rather close agreement with the larger 
groups (in most cases) of the means and standard deviations. Ву 


employing the results from both types of comparisons, the reliability 
of the conclusions drawn will be increased. 


In comparing the vocabulary sections the multiple choice type 
Part 1 correlates higher with first semester grades than the recall type 
Part 5 for both Forms A and B in the case of the smaller groups. In 
general the evidence points to no great superiority for either type so 
far as prediction is concerned. Similarly so far as reliability (Table 
48) is concerned no significant differences are found. 

A comparison of grammar sections Part 2 (multiple- 
Part 6 (error correction) reveals no significant differences 
power for either method; however, all the differences observed are in 
favor of the multiple-choice response type. The reliability coefficients 


(Table 48) for the recall type tend to be somewhat higher than those 
for the multiple-choice type. 


choice) and 
in predictive 


TABLE 48 


RELIABILITY COEFFICIENTS FOR PARTS 
French A-1 (N = 179) 


French B-1 (N = 242) 
Part 1 969 Рагї 1 .897 
Part 2 853 Part 2 851 
Part 3 847 Part 3 812 
Part 4 891 Part 4 923 
French A-2 (N = 236) French B-2 (N = 209) 
Part 5 839 Part 5 914 
Part 6 926 Part 6 894 
Part 7 864 Part 7 870 
Part 8 809 Part 8 778 


The idiom section Part 3 compares favorably with the other parts 
so far as prediction is concerned. 


It is also very reliable for a 20-item 
test. 


The true-false method of testin 
predictive power for Form A-1 
however, the results indicate th 


Б reading comprehension shows higher 
than for Form B-1, In either case, 
at this method shows promise. The 
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reliability coefficients for both forms are high, being .891 for Form А-1 
and .923 for Form B-1. 

The verb section Part 7, exhibits uniformly good predictive power 
with all groups and for both forms. The reliability (Table 48) is 
high for a 10-minute test. 

The new verb section, Part 8 (French to English) is effective in 
prediction and while not so reliable as the other sections gives promise 
of being useful as a sub-section of Part 7. 

The correlations with grades for similar parts of the Iowa Placement 
French Examination B and the Experimental Form B, Table 50, while 
based on a rather small group, indicate that these parts are of about 


TABLE 49 


RELIABILITY COEFFICIENTS FOR VARIOUS COMBINATIONS О: 
FORMS A AND B 


F PARTS, EXPERIMENTAL 


Coefficient N 
Sum of 1А + 2A + 4A + 7A 931 84 
Sum of 4A + 5А + 6A + 7A 930 84 
Sum of 1B + 2B + 4B + 7B 979 81 
Sum of 4B + 5B + 6B + 7B 943 81 


TABLE 50 
WITH FIRST SEMESTER GRADES FOR PARTS OF IOWA PLACEMENT FRENCH 


CORRELATION 
EXAMINATION, REV. B, AND EXPERIMENTAL FORM D, BASED ON SAME STUDENTS 
Iowa Placement French Training Rev. В (ЇЧ — 43) 
Type r Р.Е. 
Part 1. Vocabulary recall .654 057 
Part 2. Grammar error correction 609 064 
.590 .065 


Part 3. Verbs multiple-choice 
Part 4. Reading comprehension, written answer 386 086 


Experimental French B-1 (N — 43) 


Type r Р.Е. 

Part 1. Vocabulary multiple-choice 637 059 
Part 2. Grammar multiple-choice .136 046 
Part 4, Reading comprehension, true-false 447 081 
Part 5. Vocabulary recall 694 051 
Part 6. Grammar error recall 756 043 
824 004 


Part 7. Verbs multiple-choice 


equal effectiveness for prediction. The observed differences are in 
favor of the experimental form in the case of grammar and verbs. 
These differences, however, are not statistically reliable in the case 
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of grammar. The unusually high coefficient for Part 7 is undoubtedly 
due to large chance errors. Similarly the difference between the co- 
efficients for reading comprehension sections is in favor of the new 
material but is not statistically reliable. 


To determine the effectiveness of various combinations of parts 
their predictive power was found (Table 47 ). In the case of Form 
A the total score obtained by adding part scores for Parts 1, 2, 4, 
and 7 correlated with first semester grades .694. In this combination 
vocabulary, Part 1, and grammar, Part 2, are tested by the multiple- 
choice type of response. For the combination of Parts 4, 5, 6, and 7, 
the coefficient obtained was .708. The responses employed are for 
Part 5 (vocabulary) the recall type and for Part 6 (grammar) the 
error-correction type. Similar comparisons for Form B gave coefficients 
of .613 and .606 respectively. The reliability coefficient for the 
combination 1, 2, 4, and 7 is .931 and for combination 4, 3,16, 7.38 
930, in the case of Form А and for Form B these coefficients are 
919 and .943 respectively. The differences so far as prediction is 
concerned are not significant. The first combination for Form B is 
more reliable than Form A. 

The difference between the average z score for the A-B group and 
the D-F group for the part combination 1, 2, 4, 7, Form A, is 2.377 
and for combination 4, 5, б, 7 is 2.115. In the case of the first com- 
bination, .87 per cent of the D-F Students reach or exceed the mean 
of the A-B students and for the second combination 1.70 per cent. 
For Form B these values for the differences between average z scores 
are 1.658 and 1.608 respectively and for the percentage of D-F stu- 
dents reaching or exceeding the mean of the A-B students the values 
аге 4.85 per cent and 5.37 per cent respectively. 

The results of the item analysis of the experimental forms are given 
in Table 51, A comparison of the results for Part 1 (vocabulary 


multiple-choice type) and Part s (vocabulary recall type) shows a 
slight superiority for Part 5 of F 


In comparing Part 2 (grammar multiple- 
(grammar error-correction type), 
correction technique proves some 
technique but that for Form B 


choice type) and Part 6 
one finds that in Form A the error- 


what superior to- the multiple-choice 
the situation is reversed. As in the 
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case of vocabulary, so in the grammar sections the two techniques are 
about equal in effectiveness. 

The idiom section, Part 3, and the new verb sections, Parts 7 and 
8, show a large proportion of effective items. There are enough ef- 


TABLE 51 
NUMBER OF ITEMS WITH DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT 
FOR THE A-B AND D-F GROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 


Forms А-1 and А-2 Forms B-1 and B-2 

Total No.3 or more Total No.3 or more 

Items times P.E.'s Items times P.E.'s 
Part 1 60 33 Part 1 60 36 
Part 2 40 31 Part 2 40 36 
Part 3 20 14 Part 3 20 13 
Part 4 36 16 Part 4 38 18 
Part 5 60 35 Part 5 60 30 
Part 6 40 36 Part 6 40 28 
Part 7 31 25 Part 7 31 21 
Part 8 25 16 Part 8 25 16 


fective verb items to permit the construction of revised parts which 
will be made up entirely of valid and effective items. 

Part 4 (reading comprehension true-false type) shows a smaller 
proportion of effective items than the other parts. For Form A 44.4 
per cent of the items show significant differences between the mean 
| percentage correct for the А-В and the D-F groups, and for Form B 
47.4 per cent. For Part 4 of the placement examination Form А em- 
ploying the recall method of testing comprehension 70 per cent of the 
| items gave significant differences. However, as it is possible to employ 
a much larger number of true-false items in a given period of time, 
the true-false method actually shows a larger number of effective items. 
| Further, many of the remaining true-false items show differences which 
are almost three times their probable errors. From the points of 
view of reliability and of effectiveness for prediction the two methods 
seem to be approximately equal. 

The new vocabulary sections show 55 per cent of effective items for 
Form A and 60 per cent for Form B. This compares with 40 per cent 
for Part 1 of French Training Revised A of the placement series, 

The new grammar sections, Part 2 (multiple-choice) show 75.2 per 
cent of effective items for Form A and 90 per cent for Form B. Parts 
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5 (error-correction) of the new forms show 90 and 70 per cent of 
effective items for Forms A and B respectively. For the French 
Training Revised A (placement) Part 2 (grammar error correction), 60 
per cent of the items were effective. 


Part 7 Form A shows 80.7 per cent of effective items and Part 7 
Form B has 67.7 per cent. Part 3 of the French Training Revised A 
(placement) shows 60 per cent of effective items. These are verb 
sections all employing the multiple-choice type of response. In addi- 
tion to being more valid in content, the new verb parts contain a larger 
proportion of effective items. 

In answer to the questions 7 and 8 raised in Chapter III as to the 
possibility of constructing training tests sufficiently flexible to cover 
à grade range from the eleventh year in high school to the beginning 
of the second year of college where courses are offered consecutively 
and at the same time maintain predictive power as a placement test, 
it may be pointed out that for French, as at present organized, the re- 
quirements for both purposes seem to be somewhat similar. The vocab- 
ulary section was found to be most effective when based on the 2000 
most common words; similarly for grammar and verbs the most funda- 

mental principles were the most effective for 
finally the reading comprehension section 
difficult. One problem which remains is 
at the high school level to provide adequat 


The above Statements apply when the present practice is followed 
of admitting to second-year college French, students with one year 
of college or two years of high school French. 


placement purposes and 
5 grade from very easy to 
that of giving enough tests 
€ norms for each grade. 


If the work was so 


justified by the above results are: (1) 
Stressing the most fundamental ele- 
grammar, and verbs has predictive power as 
test not based entirely on the most common 
multiple-choice and true-false techniques as 
materials are as satisfactory for predictive 


that a valid achievement test 
ments of vocabulary, 
high or higher than a 
elements; (2) that the 
employed in the French 
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purposes as the recall or error-correction type of test; (3) that, when 
the experimental materials have been revised by eliminating the in- 
effective items and the parts weighted in accordance with the re- 
gression coefficients, they will be superior to the original placement 
examinations. 


SPANISH TRAINING 


ANALYTICAL PROCEDURES AND RESULTS 


The Spanish Training Examination is made up of four parts: vo- 
cabulary, grammar, verbs, and reading comprehension. The vocabulary 
section consists of fifty multiple-choice items, the grammar section of 
forty sentences, each of which contains an error to be corrected. In 
the verb section a sentence is given with the infinitive of the verb 
to be employed in parenthesis. The student is to write the correct 
verb form. Reading comprehension is tested by three paragraphs of 
increasing difficulty each followed by a number of questions in English 
to be answered in English. 

The results of the application of the partial and multiple correla- 
tion and regression technique to the Spanish test are given in Table 52. 


TABLE 32 


PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN SPANISH OF THE VARIOUS PARTS 
OF THE SPANISH TRAINING EXAMINATION (REVISED) A 
(MULTIPLE CORRELATION AND REGRESSION WEIGHTS) 
SPANISH TRAINING REVISED A (N = 181) 


Variables 
Variable 1___ First semester Spanish grades 
Variable 222... 5 T a 2:20 Score Part 1 
Variable Score Part 2 
Variable -Score Part 3 
"VaHable $..--—— ce ---Score Part 4 


Zero Order Coeficients 


Variable 2 3 4 5 XM. Sigma 
1 459 547 451 452 4.216 1.057 
2 674 .582 576 11.738 4.288 
3 .804 .655 11.662 6.742 
4 598 6.380 6.008 
5 24.318 7.898 
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Multiple Coefficient 
Rı.2345 = .570 
Regression Coefficients and Equations 


b12.345 028 Deviation Form Raw Scores 
b13.245 059 Xi = .028хо + .059x3 — .002x4 + .016x5 
bis.235 —.002 Deviation Form z Scores 

bis.o34 016 


21 = .114z2 + .376z3 — 01124 + 12025 


The multiple coefficient of .570 which was found indicates that the 
test is functioning as well as the better tests of this type so far as pre- 
diction is concerned. There is no significant difference between the 
zero order coefficient (.532) and the multiple coefficient indicating that 
the parts are optimally weighted. The regression coefficients in z score 
form indicate that by far the largest relative contribution to prediction 
is made by Part 2 with Parts 1 and 4 approximately equal in this 
respect. Part 3, the verb section, has a very low negative weight. 
This may be due to the extreme difficulty of this part which will be 
indicated later. 

The differences between the z scores of the A- 
Table 53, show that each part discriminates highl 
rior and inferior students. The z score difference for the whole test 
shows that it is the third in effectiveness in the placement series, 

The reliability coefficient for the entire test is 88. The reliability 
Coefficient for each part is given in Table 54, 


B and D-F groups, 
у between the supe- 


TABLE 53 
DIFFERENCES BETWEEN AVERAGE Z SCORES OF A- 


B AND D-F GROUPS AND PERCENTAGE 
OF D-F STUDENTS REACHING OR EXCEEDIJ 


NG MEAN OF A-B STUDENTS 
Whole Test Parti Part 2 Part3 Part4 


Z Score Difference 1.33 1.17 1.22 1.05 1.66 


Percentage D-F reaching or exceeding 


Mean A-B 918 1210 1112 14.69 4.85 


TABLE 54 


RELIABILITY COEFFICIENTS FOR EACH PART, SPANISH TRAINING 
REVISED А (N = 100) 


Part 1 Part 2 Part 3 


Part 4 
858 871 


903 779 
The reliability of the test 


as а whole while comparing favorably 
with most educational tests sho 


uld be increased to above .90. 
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TABLE 55 


ETWEEN A-B AND D-F GROUPS WHICH 
JBABLE ERRORS 


NUMBER OF ITEMS SHOWING DIFFERENCES В 
ARE THREE AND MORE AND FOUR AND MORE TIMES THEIR РКО! 


Spanish А-1 Total Items No. 3 Times No. 4 Times 
in Part P.E. Р.Е. 
Part 1 50 30 21 
Part 2 40 30 29 
Part 3 40 24 22 
Part 4 20 12 10 


The parts of the examination compared with others in the place- 
ment series rank very well so far as the number of items which dis- 
criminate significantly between superior an 
cerned. 

In building the vocabulary section reliance was of necessity placed 
on the judgment of Spanish instructors as the word count by Buchanan 
(8) was not then available. The selection, however, as measured by 
the frequency of occurrence, seems to have been well made as 45 of 
the 50 items fall within the first 2000 words in frequency. 


The needed changes so far as this part is concerned were found to 
be: (1) to improve its content validity and (2) if possible to increase 
its predictive power. 

As in the case of French, the words discriminating highly between 
superior and inferior students were found to be the rather common 
words. However, in the Spanish section the most effective words fell 
within the first 1500 in frequency instead of in the first 2000, as in 
French. Of the words within the first 1500 in frequency, 66 per cent 
discriminated significantly between the A-B and the D-F groups. For 
those above the first 1500, but 28 per cent were similarly effective. 
As the type of response employed, the multiple-choice, proved effective 
this should be retained. 

The most important grammatical principles in Spanish have not 
been scientifically determined. Expert judgment was therefore relied 
upon. Three Spanish instructors, one а native Spaniard, agreed that 
for the most part only the most common grammatical principles were 
included. Items 17, 19, 22, 25, 27, 31, 34, and 38 were regarded 
as invalid or as containing no error for correction. These must be 


eliminated. 
The grammar section, P: 
so far as content is Conc 


d inferior students is con- 


art 2, may be described as reasonably valid 
erned and very effective in discriminating 
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between superior and inferior students. It is the most important part 
of the examination for the prediction of grades. The most desirable 
change would seem to be that of making the scoring more objective 
and more rapid, in addition to eliminating the six items classed as 
invalid. 


Part 3, the verb section, requires the Student to give the form of 
а verb which would correctly apply to the sentence involved. The 
infinitive is supplied. This technique is somewhat objectionable be- 
cause of its lack of objectivity in Scoring. Irregular verbs with radical 
changes in spelling may be misspelled and the item missed for that 
reason. It.is rather far removed in its demands on the student from 
the present emphasis upon reading accomplishment. The verbs em- 
ployed, however, are all Very common. 


The distribution of persons tested is as follows: first person singular 
6; second singular 4; third singular 20; first plural 3; second plural 
none, and third plural 1. The extreme emphasis on third person singu- 
lar, one-half of all the items, is hardly justifiable. The number of 
these should be reduced and the number of third person plural items 
increased considerably, with some additions to the remaining persons. 


The number of times each tense was found to be tested is as follows: 


Present indicative 2 Present subjunctive 17 
Preterite indicative 3 Imperfect subjunctive 6 
Imperfect indicative 2 Infinitive 4 
Future indicative 1 Future subjunctive 1 
Conditional 1 Past participle 2 
Present participle 1 


other tenses eleyen gave differences between the percentage of A-B 
and D-F students getting the items correct which were at least three 
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times their probable errors, whereas for the subjunctives this was true 
for eleven of the twenty-two items given. 

The regression coefficient for the verb section, given above, is low 
and negative, indicating that so far as prediction is concerned this 
part is making no contribution not already cared for by the other 
parts. 

This section is by far the most difficult 
placement series. The mean percentage of correct attempts for the 
superior students was 29.66 and for the inferior group 9.91. Such 
extreme difficulty is not necessary to maintain high predictive power 
as Parts 1, 2, and 4 which are far less difficult each gave correlation co- 
efficients with grades which are higher than for Part 3. With one 
group of students studied forty-nine of seventy-seven scores were from 
zero to three inclusive. 

The conclusions relative to the verb section are: (1) a more ob- 
jective testing technique would be desirable; (2) it is too difficult to 
be valid as an achievement test; (3) a better distribution of persons 
tested can be made; (4) the tenses are not tested in proportion to 
their importance. 

Part 4 is effective as shown by (1) the regression coefficient, (2) 
the average z score differences, (3) the number of functioning items. 
In general this part may be described as effective and valid. It would 
be advantageous to develop a more objective type of response sus- 
ceptible to rapid scoring. 


of any part of the entire 


EXPERIMENTAL PROCEDURES 
р Тһе experimental materials prepared attempted to apply the prin- 
ciples developed in the above analysis of the Spanish Training (Re- 
vised A) Examination. 


The new vocabulary sections consis 
from the first 1500 of the Spanish wordbook by Buchanan (8). Those 


words which were so common that they were not even included in 
the word count were omitted. The method employed is multiple- 
choice, with five responses to choose from. 
The new grammar sections, containing forty-five items each, are 
es presented in two ways, one calling 


prepared with the same sentenc : 
for the correction of an error, the correct word or phrase to be written 
out as in the placement test, and the other using the multiple-choice 


technique with five responses. Only the most important grammatical 


principles are tested. 


t of sixty-two words selected 
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Verbs are tested in two Ways. The first section consists of fifty 
multiple-choice items all in Spanish. The verbs employed are all 
selected from the first 500 words in frequency of occurrence to reduce 
the vocabulary element to a minimum. The number of items testing 
a particular tense is as far as possible in proportion to the importance 
of that tense and similarly {ог the Various persons, such as third 
singular, etc. The second verb test is made up of twenty-five items 


in which a Spanish verb form is followed by three English expressions 
but one of which is correct, 


prehension the same three paragraphs are 


ment examinations for Spanish. Instead of 
having the student write out shor 


the entire Paragraph. 


The experimental forms, in brief, consist of Parts as follows: 


Spanish A-1 and Spanish B-1 
Part 1. Vocabulary multiple-choice 


62 items 

Part 2. Grammar error-correction 45 items 
Part 3. Reading comprehension true-false Form А-1 39 items 
Form B-1 42 items 

Spanish A-2 and Spanish B-2 

Part 4. Grammar multiple-choice 45 items 
Part 5. Verbs multiple-choice 50 items 
Part 6. Verbs Spanish to English multiple-choice 25 items 


Same students. Where the same 
in order to eliminate practice effects, one. 
type and the other half the other type 
these reversed two days later. For the ѕ 
available for Spanish Training, Form B of the placement series. 

The inter-part correlations, Table 56, show rather close relationship 
between grammar, Part 4, and verbs, Part 5. Fairly high coefficients 


е given the two forms, 
-half the group was given one 
of test on the first day with 
maller groups results were also 


тты 


= ба... 


LI 
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also obtain between Part 1, vocabulary, and Part 2, grammar, and 
between the two verb sections Parts 5 and 6. 

The correlations between the parts of А-1 and А-2 and parts of 
B-1 and B-2, Table 57, show fairly high coefficients between vocabu- 
lary, Part 1, and grammar, Part 4, and also between Part 1 and verbs, 
Part 5. A similar relationship holds for grammar, Part 2 and Parts 
4and 5. The correlations between reading comprehension, Part 3, and 
Parts 4 and 5 are somewhat lower. The high correlation between the 


TABLE 56 
INTER-CORRELATIONS OF PARTS 
Spanish A-1 (N = 249) 


Part 1 Part 2 Part 3 
Part 1 at 674 527 
Part 2 674 = AIS 
CERE 527 415 nA 
Spanish A-2 (N — 217) 
Part 4 Part 5 Part 6 
Part 4 n 594 443 
HAE 594 5 802 
Рай-6 443 602 cu 
Spanish B-1 (N — 147) 
Part 1 Part 2 Part 3 
IY aa 716 574 
Fert 2 716 NP 549 
"RES X 574 549 TE 
Spanish B-2 (N — 243) 
Part 4 Part 5 Part 6 
Part 4 i 825 i 686 
Part 5 825 Мыр 751 
Part 6 686 751 one 
TABLE 57 
INTER-CORRELATIONS OF RELATED PARTS OF EXPERIMENTAL FORMS 
Spanish A-1 and Spanish A-2 (N = 67) 
Spanish A-2 
Spanish A-1 Part 4 Part 5 
Part 1 .710 714 
Part 2 784 .790 
Part 3 .580 649 
Spanish B-1 and Spanish B-2 (N = 83) 
Spanish B-2 
Spanish B-1 Part 4 Part 5 
Part 1 704 .661 
Part 2 .786 773 
530 215 


Part 3 
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two grammar sections which employ the same sentences and vary only 
in type of response required is to be expected. The rather high corre- 
lation between verbs and grammar indicates that the two functions 
are somewhat similar as they are measured here, 


TABLE 58 


CORRELATION BETWEEN PARTS OF IOWA PLACEMENT SPANISH EXAMINATION B AND 
SIMILAR PARTS OF EXPERIMENTAL FORMS AND OF PARTS WITH FIRST SEMESTER GRADES 
EXPERIMENTAL FORM B (N — 77) 


Part 2 Part 3 Part 4 Part 5 Grades 
I. P. E. Spanish Training B 
Part 2 797 ке 729 E 658 
Part 3 25 == ес 517 612 
Part 4 ы 516 235, = 395 
Grades 701 .570 .685 646 —-- 


The correlations in Table 58 Show that the old and the new gram- 
mar sections, Part 2 I. P. E. and Parts 2 and 4 new, are quite closely 
related in function; that the I. P. E. verb test, Part 3, and the new 
verb test, Part 4, are not closely related in function. As the I. P. E. 
grammar test was effective it was desired to have the new forms corre- 
late well with it. As the verb test was unsatisfactory high correlation 
would probably be undesirable. Since the two techniques for testing 
reading comprehension are quite different in function the final selection 
between them must rest on predictive power. For this group the true- 
false technique correlates .570 with grades and the I. P. E. method 
395. Тһе correlations also indicate that the grammar and verb sec- 
tions of the new are as effective for grade prediction as the I. Р. E. 
form and probably slightly more so. The smallest coefficient used 
in the above interpretation is more than six times and the largest more 
than twenty-five times its probable error. 


The correlations for each part and for various part combinations 
with first semester grades are given in Table 59 for 


the larger groups 
and in Table 60 for the smaller groups. 
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TABLE 59 


MEANS, STANDARD DEVIATIONS, AND CORRELATION COEFFICIENTS WITH FIRST SEMESTER 
GRADES FOR LARGER GROUPS 


Spanish A-1 (N — 249) 


r with Standard 
Grades P.E. Mean Deviation 
Part 1 669 024 40.50 7.899 
Part 2 599 027 21.87 10.863 
Part 3 437 034 30.30 5.880 
goul 694 021 93.03 20.279 
Spanish A-2 (N = 217) 
r with Standard 
Grades P.E. Mean Deviation 
Part 4 674 024 29.73 7.749 
Part 5 ло 022 31.71 9.489 
Part 6 .508 034 19.92 4.696 
Total 748 .020 81.40 19.140 
Spanish В-1 (№ = 147) 
r with Standard 
Grades P. E. Mean Deviation 
Part 1 674 029 35.35 9.507 
Part 2 .603 035 1825 9,510 
Pat 3 571 037 31.92 6.612 
Total 766 023 85.53 22.050 
Spanish B-2 (№ = 243) 
r with Standard 
Grades P.E. Mean Deviation 
Part 4 659 025 31.54 9.012 
d Part 5 608 1027 н тоо, 
1 P 029 18. Ў 
тИ s5 81.24 21.845 


Total 686 023 
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TABLE 60 
MEANS, STANDARD DEVIATIONS, AND CORRELATION COEFFICIENTS WITH FIRST SEMESTER 
GRADES FOR SMALLEER GROUPS 


Spanish А-1 and А-2 (N — 67) 


Form А-1 
Part 1 709 
Part 2 713 
Part 3 434 
Form А-2 
Part 4 752 
Part 5 798 
Рагі 6 .675 


Sum oí 1 +2 +345 811 
Sumof1+3+4+5 784 


PE. Mean 
-041 34.514 
932 13.582 
065 27.246 
034 27.798 
029 30.082 
043 20.172 
028 105.964 
031 119.747 


Spanish B-1 and B-2 (N — 83) 


Form B-1 
Part 1 721 035 32.60 
Part 2 657 041 14.89 
Part 3 685 039 31,01 
Form B-2 
Part 4 590 047 26.06 
Part 5 :646 043 27.83 
Part 6 472 .056 17.75 
Sum of 1--2--3-F5 755 031 105.69 
Sum of 1 4-3 -- 4 -- 5 .690 .038 117.21 
TABLE 61 


RELIABILITY COEFFICIENTS FOR PARTS 


Spanish A-1 (N = 254) Spanish A-2 
Part 1 823 Part 4 
Part 2 922 Part 5 
Part 3 .805 Part 6 

Spanish B-1 (N — 150) Spanish B-2 
Part 1 898 Part 4 
Part 2 908 Part 5 
Part 3 865 Part 6 

TABLE 62 


Standard 
Deviation 


7.660 
9.576 
7.230 


7.612 
9.308 
4.942 
29.136 
28.441 


9.216 
9.561 
7.740 


9,288 
9.636 
4.174 
31.088 
31.548 


(N = 266) 


863 
913 
925 


(N = 253) 


RELIABILITY COEFFICIENTS FOR VARIOUS COMBINATIONS OF PARTS 
Coefficient N 


Sum of 1A + 2А + ЗА + 5A 
Sum of 1A + ЗА + 4А + 5A 
Sum of 1B + 2B + 3B + 5B 
Sum of 1B + 3B + 4B + 5B 


933 
929 
967 
922 


.918 
.900 
.863 


67 
67 
83 
83 
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The results presented in Table 59 show that the A and B forms are 
essentially the same in predictive power. Part 3 of Form А-1 is 
somewhat less effective than Part 3 of B-1. This is also borne out 
by the correlations given for the smaller groups, Table 60. Should 
future results be similar it would seem that these tests in addition to 
being improved in content are probably more effective for the prediction 
of first semester grades than the Iowa Placement Spanish Examination, 


" The means and standard deviations for A-2 and B-2 are very close, 
indicating the essential equivalence of these forms. In the case of 
Forms А-1 and A-2, however, there is considerable divergence. An 
examination of the scores for the various colleges perhaps explains 
this. Two of the schools which coüperated in giving Form А-1 have 
highly selected students. The means for these two groups are near 
the upper quartile of the entire sample. In all probability therefore 
Forms A-1 and A-2 are more nearly equivalent than the above results 


would indicate. 


The results given in Table 60 for the smaller groups which were 


given both Forms А-1 and А-2 and Forms B-1 and B-2 show that 
), is about the same so far as 


Part 2, grammar (error-correction type 

the prediction of grades is concerned as Part 4, grammar multiple- 
choice. The correlations in Table 59 also point to the same conclu- 
sion. As the multiple-choice technique permits more rapid scoring it 
would seem that it should be accepted in place of the error-correction 
type of response. So far as reliability is concerned the coefficients 
given in Table 61 indicate that the two grammar sections are approxi- 
mately equal. The reliability of all the parts in both forms is high. 


The two combinations of parts Table 60, predict grades about 
equally well with a slight but not statistically significant difference in 
favor of the first combination. The two combinations are identical 
except for the use of Part 2, grammar (error-correction) , in the first 
combination and Part 4, grammar (multiple-choice), in the second 
combination. The two combinations are also nearly equal in reli- 


ability as shown in Table 62. 
The results of the item analysis 0 
63. 


{ the new forms are given in Table 
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TABLE 63 


NUMBER OF ITEMS IN EACH PART SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE 
CORRECT FOR THE A-B AND D-F GROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 


Spanish А-1 and А-2 
No. 3 or more 


Total Items Times P. E.’s 
Part 1 62 40 
Part 2 45 42 
Part 3 39 11 
Part 4 45 37 
Part 5 50 46 
Part 6 25 19 
Spanish B-1 and B-2 
Part 1 62 46 
Part 2 45 37 
Part 3 42 14 
Part 4 45 42 
Part 5 50 46 
Part 6 25 2 


The item analysis of the experimental forms shows that the new 
vocabulary section, Part 1 of both Forms A-2 and B-2, contains 64.5 
and 75.8 per cent respectively of effective items. "This compares with 
60 per cent of such items for the placement section (Table 55). The 
new grammar sections, Parts 2 and 4, of both Forms A-2 and B-2, the 
first employing the error-correction technique and the second the mul- 
tiple-choice technique, are equal so far as the number of effective 
items is concerned. Parts 2A and 4B contain 93.3 per cent and Parts 
2B and 4A, 82.2 per cent of effective items. The placement grammar 
section shows 75 per cent of such items (Table 55). The new verb 
sections, Part 5 of both forms, each show 92 per cent of effective 
items. The placement verb section, Part 3 (Table 55), shows 60 per 
cent of such items. The Spanish-to-English verb section, Part 6 of 


both Forms А-2 and B-2, contains 76 and 82 per cent respectively 
of effective items. 


The application of the principles derived from the analysis has re- 
sulted in considerable improvement in the vocabulary, grammar, and 
verb sections. For the reading comprehension sections, Part 4 (place- 
ment), Form A and Part 3 (new), Forms A-2 and B-2, the percentages 
of effective items are 60, 28.2, and 33.3 respectively, indicating à 
marked superiority for the placement section in this respect. 


‚ of the second year of college where cours 
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In answer to the questions 7 and 8 raised in Chapter III as to the 
possibility of constructing training tests sufficiently flexible to cover a 
grade range from the eleventh year in high school to the beginning 
es are offered consecutively 
and at the same time maintain predictive power as а placement test, 
it may be pointed out that for Spanish, as at present organized, the 
requirements for both purposes seem to be similar. The vocabulary 
section was found to be most effective when based on the 1500 most 
common words; similarly for grammar and verbs the most fundamental 
principles were the most effective for placement purposes and finally 
the reading comprehension sections grade from very easy to difficult. 
One problem which remains is that of giving enough tests at the high 
school level to provide adequate norms for each grade. 

If the high school courses were 50 articulated with college courses 
that they were truly consecutive the tests could not be said to cover 
the broad range referred to above. It would perhaps be safer to 
say that the range of the tests is two years of high school and one and 
one-half years of college Spanish. Courses at the college level vary 
greatly, some being three-hour courses and others five-hour courses. 

The results given above indicate the following conclusions: (1) that 
a valid content examination is аз effective for college placement as one 
based on less common and less valid materials; (2) that the multiple- 
Choice response is practically as effective as the error-correction re- 
sponse for this type of test; (3) that the true-false method of testing 
reading comprehension, while measuring a very different set of func- 
tions from the written-answer method, is as effective for the prediction 
of grades as the written-answer method. It is slightly more reliable 
and is more easily scored; the written-answer method shows а much 
larger proportion of effective items; (4) that the new examinations 
when revised on the basis of the results of the item analysis will be 
at least as effective as the Iowa Placement Spanish Examination for 
the prediction of grades and will be far more valid as to content in 
Part 3 and improved in this respect in Part 1. 

GUAGE APTITUDE 
р RESULTS 


FOREIGN LAN 

ANALYTICAL PROCEDURES AN 

The Foreign Language Aptitude Revised A Examination was found 
to be the most effective examination in the placement series, so much 
so in fact that considering the unreliability of college grades, it was 
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decided not to attempt any radical changes. The results below apply 


only to French, as this was the only language for which the test 
had been widely employed. 


This examination consists of four parts as follows: Part 1 directs 
the student to make English nouns and pronouns plural, to change 
verb tenses and to form nouns from verbs; Part 2 tests ability to infer 
from resemblance to English the meaning of words in Esperanto; 
Part 3 outlines certain rules of grammar applying to Esperanto and 
calls for their application; Part 4 gives sentences in Esperanto and 
the English translation, requiring the student to derive the English 
equivalent of given Esperanto words and the Esperanto equivalent of 
given English words. 


The application of partial and multiple correlation and regression 
technique to this examination is given in Table 64. 


The reliability coefficient of the test is the highest in the series, 
being .97 which is about as high as can be obtained for such tests. 


The multiple coefficient .683 indicates that the test is functioning 
as well as the best available predictive instruments. This is further 
borne out by the uniformly high coefficients obtained by the various 
colleges and universities employing the test, as indicated in Table 1. 


The regression coefficients in z score form indicate that Parts 1 and 
2 are carrying almost the entire predictive load, Part 1 being the most 


important for this. "This would suggest that in scoring, Part 1 should 


be heavily weighted with Part 2 next and less weigh 


t given to Parts 
3 and 4. 


The differences between the average z scores of the A-B and of the 


TABLE 64 


PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN FRENCH OF THE VARIOUS PARTS OF 
THE FOREIGN LANGUAGE APTITUDE EXAMINATION (REVISED) A 
(MULTIPLE CORRELATIONS AND REGRESSION WEIGHTS) 


FOREIGN LANGUAGE APTITUDE REVISED A (N = 209) 


Variables 
Variable 1_____ First semester French grades 
Variable 2__ -Score Part 1 
Variable 3... Score Part 2 
Variable 4__ -Score Part 3 
Variable 5 


-Score Part 4 
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Zero Order Coefficients 


Variable 2 3 4 5 AM. Sigma 
1 653 552 439 482 3.78 1.381 
2 606 .596 573 25.62 12.007 
3 588 661 16.91 6.989 
4 454 18.70 6.554 
5 11.91 12.741 

Multiple Coefficient 
Risus = 683 
Regression Coeficients and Equations 
5 .081 Deviation Form Raw Scores 
5 .062 хі = .081х° + 1062ха — .002х4 + .006x5 
35 —002 Deviation Form z Scores 
4 006 z1 = 70422 + 3144 — 009 + 05525 


D-F students are high for every part except Part 4, as shown in Table 
65, and for the whole test the highest difference of the entire placement 
series was found. The percentage of D-F students reaching or ex- 
ceeding the mean of the A-B students for the entire test was 3.29. 


TABLE 65 
EAN Z SCORES OF THE A-B AND D-F GROUPS AND 


PERCENTAGE OF D-F STUDENTS REACHING OR EXCEEDING MEAN OF A-B STUDENTS 
Whole Test Part1 Part 2 Part3 Part4 
Difference ea ies auri t 8I 81 
Percentage D-F reaching or exceeding 
Mean A-B 
Table 66 shows the number of items which give differences between 
the A-B and D-F groups which are three or more and four or more 
times their probable errors. No other examination in the entire series 


shows so many effective items. While Part 3 is lower in this respect 


than the other parts, it is still fairly effective in comparison with other 


examinations in this series. 


DIFFERENCES BETWEEN THE M 


3.29 6.30 9.85 13.57 30.90 


TABLE 66 
ING DIFFERENCES BETWEEN MEAN PERCENTAGE 


NUMBER OF ITEMS IN EACH PART SHOW 
R MORE AND FOUR OR MORE 


CORRECT FOR A-B AND D-F GROUPS WHICH ARE THREE O 
TIMES THEIR PROBABLE ERRORS 


Total No. 3 or more No. 4 or more 

Items Times P.E.'s Times P.E.’s 
Part 1 50 43 35 
Part 2 40 25 19 
Part 3 30 18 11 
É 25 


Part 4 30 
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In conclusion, the Foreign Language Aptitude Test is the most ef- 
fective in the placement series for the prediction of first semester 
grades, which is its primary function. This is shown by the high 
multiple correlation coefficient, by the almost uniformly high coefficients 
reported to date, by the high reliability coefficient, and by the high dis- 
criminatory power of the items, parts, and the test as a whole. 

Studies should be carried out to determine the utility of this test 
for German, Spanish, and Latin. It is possible that the part weight- 


ings may vary from language to language, as well as the predictive 
power of the test. 


CHAPTER УП 


MATHEMATICS APTITUDE AND MATHEMATICS TRAINING: 
ANALYTICAL AND EXPERIMENTAL RESULTS 


MATHEMATICS APTITUDE 


ANALYTICAL PROCEDURES AND RESULTS 


The four parts of the Mathematics Aptitude Examination are as 


follows: 
Part 1. (15 items, 5 minutes.) Arithmetic and algebraic number 
series similar to those commonly found in intelligence tests. 


Primarily a test of constructive 
ends upon the student's 
the relations involved. 


Г Part 2. (15 items, 10 minutes.) 
imagination. Solution of the problems dep 
ability to visualize geometric figures and to see 
A test of logic. It measures 


Part 3. (20 items, 10 minutes.) 
bstract relations. 


ability to symbolize and to comprehend а 
15 minutes. А measure of mathematical read- 


The material (calculus) is unfamiliar to tbe 
relations and principles. 


, Part 4. (18 items, 
ing comprehension. 

student, and correct answers call for a grasp of 
partial and multiple correlation 


The results of the application of the 
le employed in the analytical 


and regression technique to the samp 
Study are given in Table 67. 
16 indicates that the combina- 


The multiple correlation coefficient .5 
ffective for the prediction of 


tion of parts employed is but fairly е 
grades. This is in part due to the very low correlation of Part 3 


with grades, which would make this part of slight value even if it were 
almost entirely unique in function. The fairly high negative regression 
Coefficient for Part 3, however, indicates that it is contributing nothing 
to prediction which is not already cared for by the other sections. 


Since the multiple coefficient 516 is considerably higher than the 
zero order coefficient of .405, a better weighting of parts is needed. 
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TABLE 67 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN MATHEMATICS OF THE VARIOUS 
PARTS OF THE MATHEMATICS APTITUDE EXAMINATION (REV.) A 
(MULTIPLE CORRELATION AND REGRESSION WEIGHTS) 
MATHEMATICS APTITUDE REVISED A (N = 223) 


Variables 


---First semester Mathematics grades 
-Score Part 1 


Variable 
Variable 


Variable 3. ---Score Part 2 
Variable 4- ---Score Part 3 
Variable 5... 


Score Part 4 
Zero Order Coefficients 


Variable 2 3 4 5 AM. Sigma 
1 354 .332 129 444 3.628 1.263 
2 449 302 493 7.135 3.818 
3 394 213 5.180 2.501 
4 276 7.619 4.767 
5 7.354 2.841 

Multiple Coefficient. 
Ri2315 = .516 
Regression Coefficients апа Equation 

bio.245 028 ` Deviation Form Raw Scores 

bi3.215 427 Xi = .028хо + .127x3 — .032x4 + .166x; 

bis.295 — —.032 Deviation Form z Scores 

bis.o34 166 zi = 08522 + 2513 — 12124 + 3735 


The regression coefficient in z score form indicates that Part 4 is 
by far the most important for prediction, with Part 2 next in import- 
ance and Part 1 of comparatively slight value. Part 3 has been dis- 
cussed above. The reliability coefficient for the whole test of .86 is 
fairly high. The part reliabilities are shown in Table 68. 


TABLE 68 
RELIABILITY COEFFICIENTS OF PARTS 
Part 1 Part 2 Part 3 Part 4 
87 66 n 68 


With the exception of Part 1 none of the reliability coefficients are 
very high. In fact, they are among the very lowest found in the 
entire placement series. This is in part due to the fact that each item 
requires more time for solution than in many other fields, thus making 
it impossible to have a section of any considerable length, which in 
turn tends to make for low reliability. For this reason it may prove 
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necessary to use both Forms A and B where high reliability is nec- 
essary. 

The differences (Table 69) between the average z scores of the A-B 
group and of the D-F group are not very high for any of the parts 
but are fairly satisfactory for Parts 1 and 4. Part 2 is quite low 
and Part 3 exceedingly low, being among the least effective in the 
entire placement series in this respect. 

The substitution of a more effective part for Part 3 should improve 
the effectiveness of the entire test for discriminating between inferior 
and superior students. 

The results of the item analysis, Table 70, show that Part 4 con- 
tains more items which discriminate highly between superior and 
inferior students than any of the other parts. Parts 1 and 2 rank 
next in the number of effective items and Part 3 is lowest, containing 


TABLE 69 
AND D-F STUDENTS AND PERCENTAGE, 


DIFFERENCES BETWEEN AVERAGE Z SCORES OF A- 
-B GROUP 


OF D-F GROUP REACHING OR EXCEEDING MEAN OF A 


Whole Test Part1 Part2 Part3 Part4 
Z Score Difference .98 .90 80 .50 96 
Percentage D-F reaching or exceeding 
Mean А-В 1635 1841 2119 3185 1685 
TABLE 70 


OWING DIFFERENCES BETWEEN MEAN PERCENTAGE 


NUMBER OF ITEMS IN EACH PART SH 
HICH ARE THREE OR MORE AND FOUR OR MORE 


CORRECT FOR A-B AND D-F GROUPS W 
TIMES THEIR PROBABLE ERRORS 


Total No. Мо.3 ог more No.4 or more 
Items Times P.E. Times P.E. 
Paral 15 6 2 
Barti 15 6 3 
Part 3 20 2 0 
Part 4 15 8 5 


es their probable errors and but two 
hich are at least three times 
e entire placement series 


no items with differences four tim 
out of the twenty items show differences w 
their probable errors. No other section in th 


is as low as this. 
The items in Part 1 which discriminated well between the A-B and 


the D-F groups are those involving fractions, alternate series, and 
literal numbers. It would seem that items containing two or more 
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distinct sets of factors or steps which are not too apparent are effective. 
In Part 2 the successful items are those which presented clear cut 
situations for solution and for which the figures to be drawn are not 
too involved. In several of the problems technical terms are employed 
which can better be considered as dependent on.training rather than 
aptitude. Some of these are "vertices," "tangent," and the naming 
of three types of geometrical figures produced (item 11), one of which 
is a trapezoid. Part 3 is so ineffective as to warrant no discussion. 
In Part 4, of the fifteen items eight show statistically reliable differences 
and four other differences are almost three times their probable errors. 

The analytical results may be summarized as follows: (1) the test 
as a whole is fairly effective for the prediction of first semester grades; 
(2) the parts with the exception of Part 1 need to be increased in 
reliability; (3) Part 3 is of very little value in any respect; (4) Parts 
4, 2, and 1 are contributing to prediction in the order named, with 
Part 4 by far the most important; (5) the needed changes in addition 
to those already given, are (a) to secure a larger number of effective 
items in Part 1, (b) to eliminate uncommon mathematical terms from 
Part 2 which have no place in an aptitude test and to build valid and 


effective items for this part, (c) to replace entirely Part 3 with a more 
valid section. 


EXPERIMENTAL PROCEDURES AND RESULTS 
The experimental forms devised may be described as follows: 
Form A-1 
Part 1A. Fifteen items similar to Part 1 of the placement ex- 
amination but employing fractions, literal numbers 


and series with two or more steps involved in their 
solution. 


Part 2A. A fore-exercise only. 

Part 3A. Ten items. The term “reciprocal of a number” is 
explained and its computation illustrated. This is 
followed by problems requiring the computation of 
the reciprocals of whole numbers, fractions, sums, 


etc. Indirect uses of the principle are also called 
for. 
Form B-1 


Part 2B. Twenty-two items. The meaning of “exponent” in 


mathematics is explained and illustrated, especially 
as it is applied to writing large numbers. The stu- 
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dent is then asked to apply the principle. Next the 
multiplication and division of numbers written ex- 
ponentially are illustrated and followed by problems 
for solution. 

Part 3B. Problems similar to those in Part 2 of the placement 
examination, except that arithmetic reasoning prob- 
lems are included. The arithmetical computation is, 
however, kept exceedingly simple, in some cases re- 
quiring only the addition or subtraction of simple 
digit numbers. The drawing of figures is frequently 
necessary. Six of the items require the application 
of principles which have been explained. 


Form C-1 
Part 1C. Same type as Part 1A. 
Part 2C. Similar to 1C but more difficult (10 items). 


Part 3C. Same type as 3B. 
The results for these new forms are given in the tables below. 


TABLE 71 
INTER-PART CORRELATIONS 
Form A-1 (N — 253) 
Part ЈА 25. ЗА = 456 
Form B-1 (N = 264) 
Part 2B vs. 3B = .384 
Form C-1 (N = 254) 


Part 1C Part 2C Part 3C 
Part 1C · os 643 340 
Part 2C 643 = 361 
361 ae 


Part 3C .340 

The inter-part correlations are fairly low except between 1С and 
2C, both of which are number series with Part 2 being much more 
difficult than Part 1C. In any case the best number series items 
will be combined and not employed as separate parts; accordingly 
the rather marked inter-correlation is not significant. It does show 
that the difficult series, while moderately related to the less difficult 
series, does not measure the same functions. 

The reliability coefficients of the parts, Table 72, are distinctly 
higher than the corresponding parts of the placement examination 
except for the sections testing number series. As possible substitutes 
for Part 3 of the placement examination, both Parts 3A and 2B are 


superior in reliability, especially Part 2B. 
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TABLE 72 
RELIABILITY COEFFICIENTS OF PARTS 
Form A-1 (N — 261) Form B-1 (N — 283) 
Part 1А 885 Part 2B 892 
Part 3A 738 Part 3B 267 
Form C-1 (N — 263) 
Part 1C 887 
Part 2C 729 
Part 3C 715 
TABLE 73 
CORRELATIONS WITH FIRST SEMESTER GRADES MEANS AND STANDARD DEVIATIONS 
Standard 
r P.E.r Mean Deviation 
Form A-1 (N = 253) 
Part 1A 309 038 8.265 4.160 
Part 3A 490 932 6.818 2.638 
Form B-1 (ЇЧ — 264) 
Part 2B 468 033 16.174 4.082 
Part 3B 468 033 8.848 3.196 
Form С-1 (ЇЧ — 254) 
Part 1C 444 034 9.634 3.932 
Part 2C 406 935 4441 2.188 
Part 3C 345 937 10.697 2.999 


The correlations with first semester grades show that of the new 
number series Part 1А is about the same in effectiveness for the pre- 
diction of grades as the number series, Part 1, of the placement exam- 
ination and that new Parts 1C and 2C are superior to the placement 
section or to Part 1A. A selection of the best items from all parts 
should give a resultant part which is somewhat superior to the present 
Part 1 of the placement examination. 

Of the new problem sections to provide additional items for Part 2 
of the placement examinations, Part 3B is most effective with 3C con- 
siderably below 3B for the prediction of grades. As Part 2 of the 
placement examination was fairly effective to begin with, a proper com- 
bination of items will overcome the objections to certain items in Part 
2 outlined above and give a more reliable section. 

The new parts prepared as possible substitutes for Part 3 of the 
placement examination, 3А and 2B, give the highest obtained correla- 
tion coefficients with first semester grades and are both more reliable 
than the part which they are to replace, especially Part 2B. Both 
Parts 3A and 3B have low inter-correlations with the parts with which 
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they were compared. Part 3A is slightly superior to 2B for the pre- 
diction of grades, but 2B is distinctly superior to 3A in reliability. 
On purely practical grounds 2A could be more rapidly scored as it has 
fewer items and could be somewhat more easily adapted to the space 


requirements for printing. 


TABLE 74 


NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT FOR 
THE A-B AND THE D-F GROUPS WHICH ARE THREE OR MORE TIMES 


THEIR PROBABLE ERRORS 
No. 3 or more 


Total Items Times P. E.'s 

Form A-1 

Part 1A 15 12 

Part 3A 10 9 
Form B-1 

Part 2B 22 18 

Part 3B 15 10 
Form C-1 

Part 1C 15 12 

Part 2C 10 8 

Part 3C 15 6 


Тһе item analysis of the experimental form, Table 74, indicates that 
the new number series sections, Parts 1A, 1C, and 2C, each contain 
80 per cent of effective items. The placement examination, Part 1 Form 
A, gave but 40 per cent of such items. The marked improvement in 
the new forms points to the correctness of the conclusions drawn from 
the analysis of Part 1 (placement). Of the new number series sec- 
tions, Part 2C is far more difficult than the parts 1A and 1C. The 
results show that this extreme difficulty does not add to the effectiveness 
of Part 2C. This is also borne out by the correlations given above. 
Parts 3B and 3C, designed to give ef- 
ems of Part 2 of the placement 
t respectively of such items, 
C. The placement problem 


The new problem sections, 
fective items to replace the ineffective it 
examinations, contain 66.6 and 40 per cen 
which is a rather low proportion in Part 3 
Section, Part 2, contains 40 per cent of effective items. 

Parts 3A (reciprocals) and 2B (exponential arithmetic), prepared as 
Dossible substitutes for Part 3 placement, which was found to be very 
ineffective, contain 90 and 81.8 per cent respectively of effective items, 
ranking very high in this respect. Both parts were previously shown 
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to be effective for the prediction of grades and both are reliable, 
especially Part 2B. 

The conclusions which seem warranted by the above results are: 
(1) that Parts 1, 2, and 4 of the placement examination should be 
retained but improved in validity and Parts 2 and 4 made more reliable 
by the substitution of more valid and reliable items; (2) that Part 3 
should be discarded; (3) that for Part 1 (placement) the new parts 
1A, 1C, and 2, and for Part 2 (placement) the new parts 3B and 3 
will furnish additional valid and reliable items which should add to 
their effectiveness; (4) that the substitute parts 3А and 2B designed 
to replace Part 3 (placement) are distinctly more reliable and more 
valid for the prediction of grades than the part to be eliminated; (5) 
that the resultant examination will be more reliable and effective than 
the present placement examination. 


MATHEMATICS (TRAINING 
ANALYTICAL PROCEDURES AND RESULTS 


The Mathematics Training Examination consists of four parts as 
follows: 

Part 1. Twenty problems devoted to the fundamentals of arith- 
metic, each bringing out a different small skill. They are drawn for 
the most part from teacher experience. (10 minutes.) 

Part 2. Twenty problems in formal algebra. All the items included 
recur constantly in algebraic work. (10 minutes.) 

Part 3. The fundamentals of geometry. This part is of the true- 
false type but it involves knowledge of geometric relationships in that 
the student is in many cases forced to draw a figure in order to give 
а correct response. (40 items, 10 minutes.) 

Part 4, Fifteen algebraic reasoning problems in which the mechan- 
ical computation is reduced to a minimum. (10 minutes.) 

A general consideration of the examination shows that it is well 
balanced as to content. There are, however, very few sources avail- 
able which provide an objective basis for the selection of content for 
this type of examination. The excellent analysis of geometry by 
Welte (72) makes it possible to readily evaluate the content of a 
geometry test. For algebra no such analysis is available, making it 
necessary to utilize textbook analysis, Syllabi, and reports such as 
that of the National Committee on Mathematics Requirements (44) 
and of the National Council of Teachers of Mathematics (45). 


| 
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The results of the application of the partial and multiple correlation 
and regression technique given in Table 75 show that for this sample 
the multiple coefficient, .366, is low. The zero order coefficient, .360, 
however, for this sample was considerably below the average of thirteen 
coefficients of .486. The central tendency of obtained coefficients is 
approximately .50 with very few below .40 or above .60. The-re- 


TABLE 75 
PREDICTIVE POWER FOR FIRST SEMESTER GRADES IN MATHEMATICS OF THE VARIOUS 
PARTS OF THE MATHEMATICS TRAINING EXAMINATION (REVISED) A 
(MULTIPLE CORRELATIONS AND REGRESSION WEIGHTS) 
MATHEMATICS TRAINING REVISED А (N — 199) 
Variables 


-First semester Mathematics grades 
-Score Part 1 
-Score Part 2 


Zero Order Coefficients 


Variable 2 3 4 5 A.M. Sigma 
1 238 314 .323 230 4.065 1.160 
2 :514 .364 484 13.216 3.507 
3 .570 576 11.227 3.529 
+ 529 10.920 3.681 
5 5.975 2.759 


Multiple Coefficient 
Ry.2345 = .366 
Regression Coefficients and Equations 
Deviation Form Raw Scores 


.029 
.052 хі = .029xe + .052x3 + .066х; — .006x5 
.066 Deviation Form z Scores 

—.006 zı = .088zo + .158z3 + .20924 — .01425 


gression coefficients show that the parts rank in predictive value in the 
order 3, 2, 1, 4 respectively. Part 4 is contributing nothing to predic- 


tion which is not already cared for by the remaining parts, as is 


indicated by a low negative regression coefficient. 

The reliability coefficient for the whole test is .88. The part reli- 
abilities are shown in Table 76. 

TABLE 76 
RELIABILITY COEFFICIENTS OF PARTS OF MATHEMATICS TRAINING 
REV. A (N — 100) 

Part 1 Part 2 Part 3 Part 4 

79 83 81 67 
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The reliability of the examination as a whole is fairly satisfactory 
and the same is true of the parts except for Part 4 which should be 
improved. It is very difficult, however, to secure high reliability in 
an algebraic reasoning section for which but ten minutes of time can be 
given. It seems probable that if reliability coefficients of over .90 are 
to be obtained for a mathematics test. a two hour testing period is 
necessary, because of the amount of time required per item when the 
solution of a problem is called for. 

The differences between the average z scores of the A-B and of 
the D-F groups (Table 77) show that the test as a whole as judged 
by this sample is but moderately high in discriminating between these 
two groups. For the same reasons given above, under a discussion of 
the multiple coefficient, the z score differences obtained here are lower 
than one would find with a more representative sample. The average z 
score difference for the whole test for a sample of 175 cases with a 
zero order coefficient with first semester grades was 1.480 with 6.94 


per cent of the inferior group reaching or exceeding the mean of the 
superior group. 


TABLE 77 


DIFFERENCES BETWEEN AVERAGE Z SCORES OF A-B AND D-F STUDENTS AND PERCENTAGE 
OF D-F GROUP REACHING OR EXCEEDING MEAN OF A-B GROUP 


Whole Test Parti Part2  Part3 Part4 


Z Score Difference 76 52 79 84 50 
Percentage D-F reaching or exceeding 
Mean of A-B 22.36 31.15 21.48 20.05 31.85 


The results of the item analysis are summarized in Table 78. 


TABLE 78 
NUMBER OF ITEMS IN EACH PART SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE 
CORRECT FOR A-B AND D-F GROUPS WHICH ARE THREE OR MORE AND FOUR OR MORE 
TIMES THEIR PROBABLE ERRORS 


"Total No. 3 or more No. 4 or more 

Items Times P.E.'s Times P.E.'s 
Part 1 20 8 3 
Part 2 20 12 6 
Part 3 40 16 7 
Part 4 15 4 2 


If differences which are three or more times their probable errors 
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are significant, the parts rank in order 2, 3, 1 and 4 as to thi 
portion of effective items. i AN 
In the arithmetic section, Part 1, the items which discriminated well 
were those involving fractions, squares, and roots. The number of 
effective items, however, is so small that no valid conclusions seem 
to be established. In Part 2, formal algebra problems, the items in- 
volving fractions, roots, and squares again discriminate well, as do 
those involving the manipulation of formulas. No principles were 


definitely apparent for explaining the success or failure of items in 
Part 3. Those items which could be solved readily by drawing an 


appropriate figure were effective, also those which could be answered 
if the student knew a certain definite fact. Some were so easy that 
practically all the D-F students answered them correctly. For Part 4 
no definite trends were discernible and would not be reliable if present 
as but four of the items are effective. 

The analysis points to the following conclusions: (1) the test as 
a whole is valid in content; (2) Part 1 contributes significantly to 
prediction, is fairly reliable but contains many ineffective items which 
should be replaced; (3) for Parts 2 and 3 additional valid items are 
needed to replace those which are not effective; (4) Part 4 is not 
contributing to the prediction of grades, is low in reliability, and 
contains very few effective items. This part must be made effective 
and reliable or a different type of material substituted. 


EXPERIMENTAL PROCEDURES AND RESULTS 


Experimental forms were made up as follows: 


Form A-1 
Part 1A. Twenty-five arithmetic problems emphasizing the 
types of problems which proved effective in the place- 
ment test and types of problems not included in 
the placement test. 


Part 2A. Twenty-five problems in algebra mechanics, The 


types of items found effective in Part 2 of the place- 
ment test were not emphasized in the experimental 
section as this would tend to overload the section 
with a few types and omit many important princi- 
ples. The aim was to build a representative test 


of algebra mechanics. 
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Form A-2 


Part 3A. Forty geometry items emphasizing the most funda- 
mental relationships in plane geometry. Many items 
were included which could be readily and rapidly 
answered if appropriate figures were drawn. The 
aim was to include many items which placed a 
premium on ability to solve simple fundamental or- 
iginals. 

Part 4A. Twelve items involving the deriving of equations 
and formulas and the manipulations of formulas em- 
ploying literal numbers. 


Form B-1 
Part 2B. Same as Part 2A, Form A-1. 
Part 4B. Same as Part 4A, Form A-2. 


The results for the new materials are given below. 


TABLE 79 
INTER-PART CORRELATIONS 


Form А-1 (N — 248) 
Part 1A vs. 2A — .618 
Form A-2 (N — 221) 
Part 3А vs, 4А — .553 
Form B-1 (N — 253) 
Part 2B vs, 4B = .580 


The inter-part correlations (Table 79) show that the various new 
parts are but moderately correlated, the highest relationship being be- 
tween arithmetic and algebra mechanics, the next highest between 
algebra mechanics and algebraic formulas, and the lowest between ge- 
ometry and algebraic formulas. 


The correlations with first semester grades (Table 80) for Forms 
A-1 and A-2 while not high indicate that the parts are fairly effective. 
Form B-1 is distinctly better than the other two forms. Of the parts 
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TABLE 80 
CORRELATIONS WITH FIRST SEMESTER MATHEMATICS GRADES MEANS AND STANDARD 
DEVIATIONS 
уз Standard 
r .E. Mean E Deviatii 
Form А-1 (N = 248) 23 
Part 1A 415 .035 13.810 4.052 
Part 2A 403 9036 18.944 4.356 
"Total А49 .034 33.192 7.401 
Form А-2 (№ — 221) 
Part 3A 402 038 19.413 9.834 
Part 4A 344 240 7.543 2.700 
Total 427 037 27.027 11.582 
Form B-1 (ЇЧ — 253) 
Part 2B 583 028 18.176 4.670 
Part 4B .524 .031 6.411 2,259 
Total 612 .026 21.868 6.266 
which were built to be equivalent, 2A and 2B prove to be nearly so, 


with 4A and 4B showing considerable difference in means. 
TABLE 81 


RELIABILITY COEFFICIENTS OF PARTS 
Form А-2 (N = 230) 


Form A-1 (N — 251) 
Part 1A 812 Part 3A 841 
Part 2A 818 Part 4А лт 
Form В-1 (№ = 262) 
Part 2B 835 
Part 4B .635 


The arithmetic section, Part 1A, in addition to being fairly effective 
for prediction is very reliable (Table 81) for a single part. The alge- 
Parts 2A and 2B, are very reliable and 2B is 


bra mechanics sections, 
very effective for prediction. Considering the essential equivalence 


and identity of type of content the rather marked difference between 
2A and 2B as to correlations with grades must be due to a considerable 
extent to sampling factors. The sections dealing with algebraic formu- 
las, Parts 4А and 4B, are not as high in reliability as should be re- 
quired. Part 3, geometry, is high in reliability and but moderately 


effective for prediction. 
alysis (Table 82) show that Part 1A (new) 


The results of the item an: 
than Part 1 (placement). The two com- 


is not much more effective 
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TABLE 82 
NUMBER OF ITEMS SHOWING DIFFERENCES BETWEEN MEAN PERCENTAGE CORRECT 
FOR THE A-B AND D-F GROUPS WHICH ARE THREE OR MORE 
TIMES THEIR PROBABLE ERRORS 
No. 3 or more 


Total Items Times P. E.'s 

Form А-1 

Part 1А 25 13 

Part 2A 25 15 
Form А-2 

Part 3A 40 17 

Part 4А 12 7 
Form B-1 

Part 2B 25 20 

Part 4B 12 11 


bined, however, will provide for the construction of a section contain- 
ing only reliable and effective items. The new algebra sections, Parts 
2A and 2B, contain 60 and 80 per cent respectively of effective items. 
The placement algebra section, Part 2, gave 60 per cent of such items. 
The new geometry section, Part 3A, is no more effective than the 
related placement section, Part 3. The placement geometry section, 
however, was contributing most to prediction and was the most reli- 
able part of the placement examination. 

Parts 4A and 4B, designed to be equivalent sections, vary greatly 
as to the proportion of effective items, the first showing 58.3 per cent 
and the second 91.7 per cent of such items. Both new sections are 
superior to the related placement section which contains but 26.6 per 
cent of effective items. А revised section made up entirely of effective 
and reliable items is assured. 

The results given above point to the following conclusions: (1) that 
the new parts are but fairly effective for the prediction of grades except 
for Form B-1, which is quite high in this respect; (2) that the new 
parts are high in reliability except 4A and 4B, which probably are 
not acceptable until improved; (3) that when the most effective 
items in the corresponding parts of the placement examinations and 
the new forms are combined a somewhat more effective examination 
will result; (4) that a more extensive research should take into ac- 
count the varying types of courses given as freshman mathematics and 
should determine the effect of these differences on the predictive value 


of the materials studied here, as well as on other types of materials 
which may be employed. 


СНАРТЕЕ УШ 
READING COMPREHENSION TESTS AS MEASURES OF 
APTITUDE 


The question as to the contribution of tests of reading comprehension 
to the predictive power of the aptitude tests may be answered by a con- 
sideration of the results already given. Unless otherwise stated, all 
correlation coefficients are based on 193 or more cases. : 
; Reading comprehension is of course involved in all parts to а con- 
siderable extent. Certain parts; however, are primarily measures of 
reading comprehension and are not complicated by as many other 
factors as in the case of tests requiring in addition to reading, the solv- 
ing of problems involving mathematical computation or the application 
of principles derived from reading. In the last type of test the student 
be unable to integrate 


may read and understand the facts given but 
them in such a way as to formulate sound judgments. То separate 


these various abilities is beyond the scope of this investigation. 
Parts which are devised primarily as measures of reading compre- 
hension and grouped ac hnique employed are: (1) Part 


3 of Chemistry Aptitude (Placement), Part 3 of English Aptitude 
(Placement), and Part 4 of Mathematics Aptitude (Placement). In 
all three of these p ven with short units, 


arts reading materials are gi 
such as phrases or cla e student 


uses, underlined an 
must select the underlin answer correctly each of 


ed statements which 
a number of questions bas 


ed on the materials 
new materials Part 3A of English Aptitude A- 
of English Aptitud! 


е B-1 employ the same type of technique. (2) The 
English Aptitude (Placement) Parts 2 and 4. These require the 
student to check the one statement in a set of three which correctly 
expresses an idea or principle given in the reading material. In the 
new English Aptitude the same type of test is employed in Parts 2A 
and 3B. (3) The new Chemistry Aptitude. This is a type of reading 
test which requires in addition to reading and comprehending small 
units the comprehension 


of an entire principle. This principle if 
grasped can easily be applied to the situations presented. Next an 
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additional set of facts is presented and this is followed by a third. 
The student must select the correct principle or employ two or even 
three of the principles in a given situation. The steps in securing а 
high score on this type of test would seem to be, in order, precise 
grasp of facts presented, relating the facts to the principles, and 
applying the principles to the test situations. Parts 2А and 2B were 
designed to study this type of test. Part 5B is similar but brings in 
some arithmetical computation which is, however, very simple, the 
major task being to grasp and utilize a principle. 

The results for the parts employing the underlined statement tech- 
nique are briefly: Part 3 of Chemistry Aptitude (Placement) ranks 
second in size of the zero order coefficient and of the regression co- 
efficient, comparing very favorably with the most effective part of the 
examination. Part 3 of English Aptitude (Placement) is second in 
importance for prediction. For Mathematics Aptitude (Placement) 
Part 4 is significantly the most useful for prediction, outranking the 
other three parts combined in this respect. The reliability coefficients 
of these parts are .892, .691, and .683 respectively, which is high for 
chemistry but only fair (nearly .70) in the case of English and mathe- 
matics. As these tests require but fifteen minutes of time it is difficult 
to get high reliability. 

Of the new materials English Aptitude (new) Parts 3A, 1B, and 2B 
correlate with grades .446, .430, and .459 respectively. These parts 


correlate but moderately with the other parts of the new English 
Aptitude forms. 


For the type of comprehension test in the English Aptitude Exam- 
ination requiring the student to check one statement in a set of three 
which correctly expresses the idea in question the results are: Part 


2 correlates with first semester grades .506 and contribute 


s significantly 
to prediction, 


ranking third out of the four parts. Part 4 correlates 
with grades .686 and ranks first in the size of regression coefficient. 
Part 4 is, however, too unreliable to place any confidence in it as a 


measure of anything, its reliability coefficient being .395. Part 2 has 
a reliability coefficient of .712. 


The new English Aptitude sections employing the same reading ma- 
terials as those in the paragraph above, Parts 2A and 3B, correlate with 
grades .496 and .372 respectively. Part 2A bas a reliability coefficient 
of .773 and 3B of .603. As Part 2A employs the same materials as 
Part 2 (Placement) and 3B as Part 4 (Placement), it would seem that 
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both 3B and 4 are too unreliable for continued use but that 2 and 2A 


are fairly reliable. 
The results for the type of reading test in Chemistry Aptitude re- 


quiring in addition to the understanding of a fact or principle, that 
› 


this principle be applied, are outlined below: 

The correlation coefficient with grades for Part 2A is .588, for 
Part 2B .662, and for Part 5B 585. The reliability coefficients are 
963, .972, and .907 respectively. The above coefficients are based 


on 237, 242, and 243 cases respectively. 

Considering the limited nature of the data here presented the con- 
clusions which seem warranted are: (1) that the reading compre- 
hension sections employing the underlined statement technique are 
among the most effective parts of the aptitude examinations in which 
they have been employed; (2) that the three-statement type of test, 
while effective for prediction for the parts considered here, is in the 
case of Parts 4 (Placement) and 3B (new) too unreliable for any 
conclusions to be safely drawn. In the case of Parts 2 and 2A the 
reliability is high enough for such short sections that in view of 
their predictive power they may be safely retained. They are, how- 
ever, inferior to the parts employing the underlined statement tech- 
nique; (3) that the type of reading tests requiring the understanding 
and the integration of facts and principles is the pest single type of 
test employed in either the placement or new forms. These are Parts 
2A, 2B, and 5B Chemistry Aptitude (new). This is true from the 
w of both predictive power and reliability; (4) that if 
п had to be relied upon as a measure of 
hension test based on the type of material 
t in question would in general be superior 
the placement or experi- 


point of vie 
any one single test sectio: 
aptitude, a reading compre 
representative of the subjec 
to any other single type 0 
mental series. 


f test employed in 


CHAPTER IX 


GENERAL SUMMARY AND CONCLUSIONS 


The first step in the investigation consisted of an analysis of the 
following Iowa Placement Examinations: 


Chemistry Aptitude Series 1 Form А 
Chemistry Training Series 1 Form A 
English Aptitude Series 1 Form A 
English Training Series 1 Form А 
Foreign Language Aptitude Series 1 Form А 
French Training Series 1 Form А 
Spanish Training Series 1 Form А 
Mathematics Aptitude Series 1 Form A 
Mathematics Training Series 1 Form А 


АП the results reported by previous workers relating to the reli- 
ability and predictive value of the examinations were compiled in order 
to provide comparative data as a check on the results of this investi- 
gation. A sample of 200 or more cases was secured for each exam- 
ination and checked as to its representativeness. The samples were 
selected so that the means and standard deviations were as near as 
possible to the norms based on a large number of cases. The reli- 
ability of each part of each test was found as well as for the whole 
test. The partial and multiple correlation and regression techniques 
were applied to each sample. An item analysis was made to deter- 
mine the discriminatory power of each item by finding the difference 
between the mean per cent of correct attempts for A and B students 
and for D and F students. The items showing high or low discrimin- 
atory power were segregated for further study to determine their 
characteristics. The differences between the average z scores of the 
A-B group and of the D-F group were found for each test as a whole 
and for each part, as one measure of discriminatory power. 


Employing the results of the analysis and the principles derived from 
them, experimental forms were prepared in an attempt to bring 
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about the needed changes. In Spanish and in French an entirely new 
examination was prepared. For other examinations entirely new parts 
were constructed or additional items similar in form to those in the 
existing part were prepared to improve the effectiveness of that part. 
In the Chemistry Aptitude experimentation a new type of reading 
test was developed which involves the understanding of principles and 
their application to increasingly complex situations. 

The results presented for the new materials include inter-part corre- 
lations, reliability coefficients, correlation coefficients with first semester 
grades, means, standard deviations, and an item analysis of each part. 
In many cases results based on the same students were available for 
the placement examinations and for the new forms, making detailed 
comparisons possible. "These were checked against results based on 
different groups which were much larger in numbers. 

The conclusions given at the end of the discussion for each examina- 
tion will not be given again in detail. Some of the conclusons are: 


Parts 1 and 3 of the Iowa Placement Chemistry Aptitude Exam- 
ination are reliable and effective and Parts 2 and 4 are not 
effective for the prediction of grades. Part 4 is also low in 
reliability. 

2. The type of reading test developed as a measure of aptitude in 
chemistry, referred to as involving the integration of principles 
at increasingly complex levels, is reliable and very effective for 
the prediction of grades. Further it does not correlate highly 
with the remaining parts of the examination. 

The Iowa Placement Chemistry Training Examination is a valid 
measure of achievement and is effective for predicting first semes- 
ter grades in college chemistry. It is also satisfactory in reli- 
ability. The problem section should be more evenly scaled. The 
part testing formulas, valence, and equations while effective and 
reliable could be improved as to objectivity of scoring. 


The experimental Chemistry Training materials will provide a 
problem section which is more effective and reliable. The new 
section testing knowledge of equations, formulas, and valence 
is reliable and very effective for the prediction of grades. In 
addition it employs a more objective and a more easily scored 
type of response than the present placement part. The new 
parts designed to supplement Parts 1 and 3 of the placement 
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examination will provide a large number of valid and effective 
items. 


The Iowa Placement English Aptitude Examination is an 
effective predictive instrument. Parts 1, 2, and 4 need to be 
improved and for Part 3 a substitute part is needed. The new 
materials will meet these requirements. 


The Iowa Placement English Training Examination is reliable 
and effective except for Part 4 which is unreliable, ineffective, 
and of doubtiul validity. The spelling section (Part 1) is too 
easy. Parts 2 and 3 testing punctuation and grammar contain 
a large proportion of ineffective items for which valid items 
should be substituted. 

a. The experimental results indicate that the new English Train- 
ing spelling sections are not significantly better than the related 
placement section. A comparison of the multiple-choice and 
error-correction techniques as measures of spelling ability reveals 
no significant differences in predictive power. The error-correction 
technique is slightly more reliable and shows a larger proportion 
of effective items. 

b. The new punctuation and grammar sections will provide valid 


and effective items to replace those in the placement parts which 
were ineffective. 


c. The new part designed to replace Part 4 of the placement ex- 
amination is distinctly superior to Part 4 in every respect. 

The Iowa Placement French Training Examination is in general 
fairly valid as to content and is fairly effective for the prediction 
of first semester grades. The vocabulary section can be markedly 
improved as to content validity. The grammar section is valid 
and effective as it is. A more objective type of response, how- 
ever, would be desirable. The verb section shows a poor distri- 


bution of the tenses tested. The reading section (Part 4) is 
valid and effective. 


a. The experimental French materials provide the following: 
(1) a more valid vocabulary section; (2) an improved grammar 
section; (3) a more valid verb section. 

b. 'The new type of reading section employing the true-false 


type of response is slightly less effective than the related place- 
ment part. 


10. 


11. 
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c. In testing knowledge of grammar the multiple-choice tech- 
nique is as reliable and effective as the error-correction technique. 
Similarly in the vocabulary section the multiple-choice type of 
response is as effective as the recall type of response. 

d. The idiom section while very difficult is reliable and effective 
and could very well be employed as a sub-test. Similarly the 
new French to English verb test proved reliable and effective. 
These are not employed in the present placement examination. 

e. A vocabulary section based on words within the first 2000 
in frequency of occurrence is more effective for placement than 
one employing words above this point. Similarly for the grammar 
and the verb sections the most essential and fundamental prin- 
ciples are found to be the most effective. 


The Iowa Placement Spanish Training Examination is reliable 
and effective for the prediction of grades. The vocabulary section 
is only fairly valid as to content. The grammar section is very 
effective and is valid as to content. The verb section is exceed- 
ingly difficult and not very valid in content as it over-emphasizes 
certain tenses and contains Very few items for certain important 
tenses. The reading section is effective and valid. 

a. The new Spanish forms provide materials for the correction 


of the defects found in the placement examination. 
hat the multiple-choice tech- 


b. The experimental results show t 
nique is as reliable and as effective for the prediction of grades 


as the error-correction technique so far as the grammar sections 


are concerned. 

c. The experimental ver 
than the related placement section. 
d. The new Spanish-to-English ver 
fective and could be advantageously employed as à sub-section 


in the reconstructed forms. 
e. In the reading section the written-answer response is prob- 


ably superior to the true-false response. 

f. Words which fall within the first 1500 in frequency of oc- 
e effective than those above this point in dis- 
criminating between superior and inferior students. Similarly 
the most fundamental grammatical principles are more effective 
in this respect than the less fundamental and less common prin- 


ciples. 


b test (multiple-choice) is more valid 


b section is reliable and ef- 


currence are mort 
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16. 
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The Iowa Placement Foreign Language Aptitude Examination 
is performing very effectively from every point of view. 


The Iowa Placement Mathematics Aptitude Examination is a 
fairly effective instrument for the prediction of grades. Part 3 
is of little value and should be eliminated. Parts 1 and 2 con- 
tain many non-functional items which should be eliminated. 
Part 4 is effective as it is and is fairly reliable. 


The experimental Mathematics Aptitude Forms provide new 
types of materials which may be substituted for Part 3 of the 
placement examination. They are superior to that part from 
every point of view. The new forms also make it possible to 
overcome some of the defects given above under conclusion num- 
ber 13. 


- The Iowa Placement Mathematics Training Examination is valid 


in content. The changes needed are to secure effective and valid 
items to replace those which are ineffective or invalid. Part 4 
specially needs to be improved in effectiveness and reliability. 


A combination of the best materials from experimental Mathe- 
matics Training Forms and the placement examination will pro- 
vide a somewhat more effective examination. 


The results of the investigation in every subject indicate that 
the best type of training examination for placement purposes is 
one made up of items and parts which emphasize the most com- 


mon and fundamental principles of the subject rather than the 
less common and less essential elements. 


Reading comprehension tests which demand the understanding 
and the application of principles and which are based upon the 
particular subject to be taken constitute the best single measure 
of aptitude employed in these examinations. 


10. 
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